Skip to content

πŸ“Š DistributionTransformLayer

πŸ“Š DistributionTransformLayer

βœ… Stable
🟒 Beginner

🎯 Overview

The DistributionTransformLayer automatically transforms numerical features to improve their distribution characteristics, making them more suitable for neural network processing. This layer supports multiple transformation types including log, square root, Box-Cox, Yeo-Johnson, and more, with an intelligent 'auto' mode that selects the best transformation based on data characteristics.

This layer is particularly valuable for preprocessing numerical data where the original distribution may not be optimal for neural network training, such as skewed distributions, heavy-tailed data, or features with varying scales.

πŸ” How It Works

The DistributionTransformLayer processes numerical features through intelligent transformation:

  1. Distribution Analysis: Analyzes input data characteristics (skewness, kurtosis, etc.)
  2. Transformation Selection: Chooses optimal transformation based on data properties
  3. Parameter Learning: Learns transformation parameters during training
  4. Data Transformation: Applies the selected transformation to normalize the data
  5. Output Generation: Returns transformed features with improved distribution
graph TD
    A[Input Features] --> B[Distribution Analysis]
    B --> C{Transform Type}

    C -->|Auto| D[Best Fit Selection]
    C -->|Manual| E[Specified Transform]

    D --> F[Log Transform]
    D --> G[Box-Cox Transform]
    D --> H[Yeo-Johnson Transform]
    D --> I[Other Transforms]

    E --> F
    E --> G
    E --> H
    E --> I

    F --> J[Transformed Features]
    G --> J
    H --> J
    I --> J

    style A fill:#e6f3ff,stroke:#4a86e8
    style J fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style D fill:#f3e5f5,stroke:#9c27b0

πŸ’‘ Why Use This Layer?

Challenge Traditional Approach DistributionTransformLayer's Solution
Skewed Data Manual transformation or ignore 🎯 Automatic detection and transformation of skewed distributions
Scale Differences Manual normalization ⚑ Intelligent scaling based on data characteristics
Distribution Types One-size-fits-all approach 🧠 Adaptive transformation for different distribution types
Preprocessing Complexity Manual feature engineering πŸ”— Automated preprocessing with learned parameters

πŸ“Š Use Cases

  • Financial Data: Transforming skewed financial metrics and ratios
  • Medical Data: Normalizing lab values and health measurements
  • Sensor Data: Preprocessing IoT and sensor readings
  • Survey Data: Transforming rating scales and response distributions
  • Time Series: Preprocessing numerical time series features

πŸš€ Quick Start

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
from kerasfactory.layers import DistributionTransformLayer

# Create sample data with skewed distribution
batch_size, num_features = 32, 10
x = keras.random.exponential((batch_size, num_features))  # Exponential distribution

# Apply automatic transformation
transformer = DistributionTransformLayer(transform_type='auto')
transformed = transformer(x)

print(f"Input shape: {x.shape}")           # (32, 10)
print(f"Output shape: {transformed.shape}")  # (32, 10)

Manual Transformation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Apply specific transformation
log_transformer = DistributionTransformLayer(transform_type='log')
log_transformed = log_transformer(x)

# Box-Cox transformation
box_cox_transformer = DistributionTransformLayer(
    transform_type='box-cox',
    lambda_param=0.5
)
box_cox_transformed = box_cox_transformer(x)

In a Sequential Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import keras
from kerasfactory.layers import DistributionTransformLayer

model = keras.Sequential([
    DistributionTransformLayer(transform_type='auto'),  # Preprocess data
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import keras
from kerasfactory.layers import DistributionTransformLayer

# Define inputs
inputs = keras.Input(shape=(20,))  # 20 numerical features

# Apply distribution transformation
x = DistributionTransformLayer(transform_type='yeo-johnson')(inputs)

# Continue processing
x = keras.layers.Dense(64, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Advanced configuration with custom parameters
transformer = DistributionTransformLayer(
    transform_type='auto',
    epsilon=1e-8,                    # Custom epsilon for numerical stability
    auto_candidates=['log', 'sqrt', 'box-cox', 'yeo-johnson'],  # Limited candidates
    name="custom_distribution_transform"
)

# Use in a complex preprocessing pipeline
inputs = keras.Input(shape=(50,))

# Multiple transformation strategies
x1 = DistributionTransformLayer(transform_type='log')(inputs)
x2 = DistributionTransformLayer(transform_type='yeo-johnson')(inputs)

# Combine different transformations
x = keras.layers.Concatenate()([x1, x2])
x = keras.layers.Dense(128, activation='relu')(x)
x = keras.layers.Dropout(0.3)(x)
outputs = keras.layers.Dense(5, activation='softmax')(x)

model = keras.Model(inputs, outputs)

πŸ“– API Reference

kerasfactory.layers.DistributionTransformLayer

This module implements a DistributionTransformLayer that applies various transformations to make data more normally distributed or to handle specific distribution types better. It's particularly useful for preprocessing data before anomaly detection or other statistical analyses.

Classes

DistributionTransformLayer
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DistributionTransformLayer(
    transform_type: str = "none",
    lambda_param: float = 0.0,
    epsilon: float = 1e-10,
    min_value: float = 0.0,
    max_value: float = 1.0,
    clip_values: bool = True,
    auto_candidates: list[str] | None = None,
    name: str | None = None,
    **kwargs: Any
)

Layer for transforming data distributions to improve anomaly detection.

This layer applies various transformations to make data more normally distributed or to handle specific distribution types better. Supported transformations include log, square root, Box-Cox, Yeo-Johnson, arcsinh, cube-root, logit, quantile, robust-scale, and min-max.

When transform_type is set to 'auto', the layer automatically selects the most appropriate transformation based on the data characteristics during training.

Parameters:

Name Type Description Default
transform_type str

Type of transformation to apply. Options are 'none', 'log', 'sqrt', 'box-cox', 'yeo-johnson', 'arcsinh', 'cube-root', 'logit', 'quantile', 'robust-scale', 'min-max', or 'auto'. Default is 'none'.

'none'
lambda_param float

Parameter for parameterized transformations like Box-Cox and Yeo-Johnson. Default is 0.0.

0.0
epsilon float

Small value added to prevent numerical issues like log(0). Default is 1e-10.

1e-10
min_value float

Minimum value for min-max scaling. Default is 0.0.

0.0
max_value float

Maximum value for min-max scaling. Default is 1.0.

1.0
clip_values bool

Whether to clip values to the specified range in min-max scaling. Default is True.

True
auto_candidates list[str] | None

list of transformation types to consider when transform_type is 'auto'. If None, all available transformations will be considered. Default is None.

None
name str | None

Optional name for the layer.

None
Input shape

N-D tensor with shape: (batch_size, ..., features)

Output shape

Same shape as input: (batch_size, ..., features)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import keras
import numpy as np
from kerasfactory.layers import DistributionTransformLayer

# Create sample input data with skewed distribution
x = keras.random.exponential((32, 10))  # 32 samples, 10 features

# Apply log transformation
log_transform = DistributionTransformLayer(transform_type="log")
y = log_transform(x)
print("Transformed output shape:", y.shape)  # (32, 10)

# Apply Box-Cox transformation with lambda=0.5
box_cox = DistributionTransformLayer(transform_type="box-cox", lambda_param=0.5)
z = box_cox(x)

# Apply arcsinh transformation (handles both positive and negative values)
arcsinh_transform = DistributionTransformLayer(transform_type="arcsinh")
a = arcsinh_transform(x)

# Apply min-max scaling to range [0, 1]
min_max = DistributionTransformLayer(transform_type="min-max", min_value=0.0, max_value=1.0)
b = min_max(x)

# Use automatic transformation selection
auto_transform = DistributionTransformLayer(transform_type="auto")
c = auto_transform(x)  # Will select the best transformation during training

Initialize the DistributionTransformLayer.

Parameters:

Name Type Description Default
transform_type str

Type of transformation to apply.

'none'
lambda_param float

Lambda parameter for Box-Cox transformation.

0.0
epsilon float

Small value to avoid division by zero.

1e-10
min_value float

Minimum value for clipping.

0.0
max_value float

Maximum value for clipping.

1.0
clip_values bool

Whether to clip values.

True
auto_candidates list[str] | None

List of candidate transformations for auto mode.

None
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DistributionTransformLayer.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def __init__(
    self,
    transform_type: str = "none",
    lambda_param: float = 0.0,
    epsilon: float = 1e-10,
    min_value: float = 0.0,
    max_value: float = 1.0,
    clip_values: bool = True,
    auto_candidates: list[str] | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DistributionTransformLayer.

    Args:
        transform_type: Type of transformation to apply.
        lambda_param: Lambda parameter for Box-Cox transformation.
        epsilon: Small value to avoid division by zero.
        min_value: Minimum value for clipping.
        max_value: Maximum value for clipping.
        clip_values: Whether to clip values.
        auto_candidates: List of candidate transformations for auto mode.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._transform_type = transform_type
    self._lambda_param = lambda_param
    self._epsilon = epsilon
    self._min_value = min_value
    self._max_value = max_value
    self._clip_values = clip_values
    self._auto_candidates = auto_candidates

    # Set public attributes BEFORE calling parent's __init__
    self.transform_type = self._transform_type
    self.lambda_param = self._lambda_param
    self.epsilon = self._epsilon
    self.min_value = self._min_value
    self.max_value = self._max_value
    self.clip_values = self._clip_values
    self.auto_candidates = self._auto_candidates

    # Define valid transformations
    self._valid_transforms = [
        "none",
        "log",
        "sqrt",
        "box-cox",
        "yeo-johnson",
        "arcsinh",
        "cube-root",
        "logit",
        "quantile",
        "robust-scale",
        "min-max",
        "auto",
    ]

    # Set default auto candidates if not provided
    if self.auto_candidates is None and self.transform_type == "auto":
        # Exclude 'none' and 'auto' from candidates
        self.auto_candidates = [
            t for t in self._valid_transforms if t not in ["none", "auto"]
        ]

    # Validate parameters
    self._validate_params()

    # Initialize auto-mode variables
    self._selected_transform = None
    self._is_initialized = False

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

πŸ”§ Parameters Deep Dive

transform_type (str)

  • Purpose: Type of transformation to apply
  • Options: 'none', 'log', 'sqrt', 'box-cox', 'yeo-johnson', 'arcsinh', 'cube-root', 'logit', 'quantile', 'robust-scale', 'min-max', 'auto'
  • Default: 'none'
  • Impact: Determines how the data is transformed
  • Recommendation: Use 'auto' for automatic selection, specific types for known distributions

lambda_param (float)

  • Purpose: Parameter for Box-Cox and Yeo-Johnson transformations
  • Range: -2.0 to 2.0 (typically 0.0 to 1.0)
  • Impact: Controls the strength of the transformation
  • Recommendation: Use 0.5 for moderate transformation, 0.0 for log-like behavior

epsilon (float)

  • Purpose: Small value to prevent numerical issues
  • Range: 1e-10 to 1e-6
  • Impact: Prevents log(0) and division by zero errors
  • Recommendation: Use 1e-8 for most cases, 1e-10 for very small values

πŸ“ˆ Performance Characteristics

  • Speed: ⚑⚑⚑⚑ Very fast - simple mathematical transformations
  • Memory: πŸ’ΎπŸ’Ύ Low memory usage - minimal additional parameters
  • Accuracy: 🎯🎯🎯🎯 Excellent for improving data distribution characteristics
  • Best For: Numerical data with skewed or non-normal distributions

🎨 Examples

Example 1: Financial Data Preprocessing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import keras
import numpy as np
from kerasfactory.layers import DistributionTransformLayer

# Simulate financial data with different distributions
batch_size = 1000

# Income data (log-normal distribution)
income = np.random.lognormal(mean=10, sigma=1, size=(batch_size, 1))

# Age data (normal distribution)
age = np.random.normal(50, 15, size=(batch_size, 1))

# Debt ratio (beta distribution)
debt_ratio = np.random.beta(2, 5, size=(batch_size, 1))

# Combine features
financial_data = np.concatenate([income, age, debt_ratio], axis=1)

# Build preprocessing model
inputs = keras.Input(shape=(3,))

# Apply different transformations for different features
income_transformed = DistributionTransformLayer(transform_type='log')(inputs[:, :1])
age_transformed = DistributionTransformLayer(transform_type='none')(inputs[:, 1:2])
debt_transformed = DistributionTransformLayer(transform_type='logit')(inputs[:, 2:3])

# Combine transformed features
x = keras.layers.Concatenate()([income_transformed, age_transformed, debt_transformed])
x = keras.layers.Dense(32, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
output = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, output)
model.compile(optimizer='adam', loss='binary_crossentropy')

Example 2: Sensor Data Preprocessing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Preprocess IoT sensor data with automatic transformation
def create_sensor_model():
    inputs = keras.Input(shape=(10,))  # 10 sensor readings

    # Automatic transformation selection
    x = DistributionTransformLayer(transform_type='auto')(inputs)

    # Additional preprocessing
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.Dense(64, activation='relu')(x)
    x = keras.layers.Dropout(0.3)(x)

    # Multiple outputs
    anomaly_score = keras.layers.Dense(1, activation='sigmoid', name='anomaly')(x)
    sensor_health = keras.layers.Dense(3, activation='softmax', name='health')(x)

    return keras.Model(inputs, [anomaly_score, sensor_health])

model = create_sensor_model()
model.compile(
    optimizer='adam',
    loss={'anomaly': 'binary_crossentropy', 'health': 'categorical_crossentropy'},
    loss_weights={'anomaly': 1.0, 'health': 0.5}
)

Example 3: Survey Data Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Process survey data with different response scales
def create_survey_model():
    inputs = keras.Input(shape=(15,))  # 15 survey questions

    # Different transformations for different question types
    # Likert scale (1-5) - no transformation needed
    likert_questions = inputs[:, :5]

    # Rating scale (0-10) - min-max scaling
    rating_questions = DistributionTransformLayer(transform_type='min-max')(inputs[:, 5:10])

    # Open-ended numerical - log transformation
    numerical_questions = DistributionTransformLayer(transform_type='log')(inputs[:, 10:15])

    # Combine all features
    x = keras.layers.Concatenate()([likert_questions, rating_questions, numerical_questions])
    x = keras.layers.Dense(64, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)
    x = keras.layers.Dense(32, activation='relu')(x)

    # Survey analysis outputs
    satisfaction = keras.layers.Dense(1, activation='sigmoid', name='satisfaction')(x)
    category = keras.layers.Dense(5, activation='softmax', name='category')(x)

    return keras.Model(inputs, [satisfaction, category])

model = create_survey_model()
model.compile(
    optimizer='adam',
    loss={'satisfaction': 'binary_crossentropy', 'category': 'categorical_crossentropy'},
    loss_weights={'satisfaction': 1.0, 'category': 0.3}
)

πŸ’‘ Tips & Best Practices

  • Auto Mode: Use 'auto' for unknown distributions, specific types for known patterns
  • Data Validation: Check for negative values before applying log transformations
  • Epsilon Tuning: Adjust epsilon based on your data's numerical precision
  • Feature-Specific: Apply different transformations to different feature types
  • Monitoring: Track transformation effects on model performance
  • Inverse Transform: Consider if you need to inverse transform predictions

⚠️ Common Pitfalls

  • Negative Values: Log and sqrt transformations require non-negative values
  • Zero Values: Use appropriate epsilon to handle zero values
  • Overfitting: Don't over-transform - sometimes original distributions are fine
  • Interpretability: Transformed features may be harder to interpret
  • Inverse Transform: Remember to inverse transform if needed for predictions

πŸ“š Further Reading