π DifferentialPreprocessingLayer
π DifferentialPreprocessingLayer
π― Overview
The DifferentialPreprocessingLayer applies multiple candidate transformations to tabular data and learns to combine them optimally. It handles missing values with learnable imputation and provides a differentiable preprocessing pipeline where the optimal preprocessing strategy is learned end-to-end.
This layer is particularly powerful for tabular data where the optimal preprocessing strategy is not known in advance, allowing the model to learn the best combination of transformations.
π How It Works
The DifferentialPreprocessingLayer processes data through multiple transformation candidates:
- Missing Value Imputation: Replaces missing values with learnable imputation vectors
- Multiple Transformations: Applies several candidate transformations:
- Identity (pass-through)
- Affine transformation (learnable scaling and bias)
- Nonlinear transformation via MLP
- Log transformation (using softplus for positivity)
- Learnable Combination: Uses softmax weights to combine transformation outputs
- End-to-End Learning: All parameters are learned jointly with the model
- Output Generation: Produces optimally preprocessed features
graph TD
A[Input Features with NaNs] --> B[Missing Value Imputation]
B --> C[Multiple Transformation Candidates]
C --> D[Identity Transform]
C --> E[Affine Transform]
C --> F[Nonlinear MLP Transform]
C --> G[Log Transform]
D --> H[Softmax Combination Weights]
E --> H
F --> H
G --> H
H --> I[Weighted Combination]
I --> J[Preprocessed Features]
K[Learnable Imputation Vector] --> B
L[Learnable Alpha Weights] --> H
style A fill:#e6f3ff,stroke:#4a86e8
style J fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#f3e5f5,stroke:#9c27b0
style H fill:#e1f5fe,stroke:#03a9f4
π‘ Why Use This Layer?
| Challenge | Traditional Approach | DifferentialPreprocessingLayer's Solution |
|---|---|---|
| Unknown Preprocessing | Manual preprocessing strategy selection | π― Automatic learning of optimal preprocessing |
| Multiple Transformations | Single transformation approach | β‘ Multiple candidates with learned combination |
| Missing Values | Separate imputation step | π§ Integrated imputation learned end-to-end |
| Adaptive Preprocessing | Fixed preprocessing pipeline | π Adaptive preprocessing that improves with training |
π Use Cases
- Unknown Data Characteristics: When optimal preprocessing strategy is unknown
- Multiple Transformation Needs: Data requiring different preprocessing approaches
- End-to-End Learning: Integrated preprocessing and modeling
- Adaptive Preprocessing: Preprocessing that adapts to data patterns
- Complex Tabular Data: Sophisticated preprocessing for complex datasets
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
π API Reference
kerasfactory.layers.DifferentialPreprocessingLayer
This module implements a DifferentialPreprocessingLayer that applies multiple candidate transformations to tabular data and learns to combine them optimally. It also handles missing values with learnable imputation. This approach is useful for tabular data where the optimal preprocessing strategy is not known in advance.
Classes
DifferentialPreprocessingLayer
1 2 3 4 5 6 | |
Differentiable preprocessing layer for numeric tabular data with multiple candidate transformations.
This layer
- Imputes missing values using a learnable imputation vector.
- Applies several candidate transformations:
- Identity (pass-through)
- Affine transformation (learnable scaling and bias)
- Nonlinear transformation via a small MLP
- Log transformation (using a softplus to ensure positivity)
- Learns softmax combination weights to aggregate the candidates.
The entire preprocessing pipeline is differentiable, so the network learns the optimal imputation and transformation jointly with downstream tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_features |
int
|
Number of numeric features in the input. |
required |
mlp_hidden_units |
int
|
Number of hidden units in the nonlinear branch. Default is 4. |
4
|
name |
str | None
|
Optional name for the layer. |
None
|
Input shape
2D tensor with shape: (batch_size, num_features)
Output shape
2D tensor with shape: (batch_size, num_features) (same as input)
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Initialize the DifferentialPreprocessingLayer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_features |
int
|
Number of input features. |
required |
mlp_hidden_units |
int
|
Number of hidden units in MLP. |
4
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/DifferentialPreprocessingLayer.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
π§ Parameters Deep Dive
num_features (int)
- Purpose: Number of numeric features in the input
- Range: 1 to 1000+ (typically 5-100)
- Impact: Must match the last dimension of your input tensor
- Recommendation: Set to the number of features in your dataset
mlp_hidden_units (int)
- Purpose: Number of hidden units in the nonlinear transformation MLP
- Range: 2 to 128+ (typically 4-32)
- Impact: Larger values = more complex nonlinear transformations
- Recommendation: Start with 4-8, increase for more complex data
π Performance Characteristics
- Speed: β‘β‘β‘ Fast - simple mathematical operations
- Memory: πΎπΎπΎ Moderate memory usage due to multiple transformations
- Accuracy: π―π―π―π― Excellent for adaptive preprocessing
- Best For: Tabular data requiring sophisticated preprocessing strategies
π¨ Examples
Example 1: Adaptive Preprocessing Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
Example 2: Comparison with Single Transformations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
Example 3: Feature-Specific Preprocessing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
π‘ Tips & Best Practices
- MLP Size: Start with 4-8 hidden units, increase for complex data
- Feature Count: Must match the number of features in your dataset
- Missing Data: Works best with moderate amounts of missing data
- End-to-End Learning: Let the model learn optimal preprocessing
- Monitoring: Track transformation usage to understand preprocessing behavior
- Combination: Use with other preprocessing layers for complex pipelines
β οΈ Common Pitfalls
- Input Shape: Must be 2D tensor (batch_size, num_features)
- Feature Mismatch: num_features must match input dimension
- NaN Handling: Only handles NaN values, not other missing value representations
- Memory Usage: Creates multiple transformation branches
- Overfitting: Can overfit on small datasets with many features
π Related Layers
- DifferentiableTabularPreprocessor - Simple differentiable preprocessing
- DistributionTransformLayer - Distribution transformation
- CastToFloat32Layer - Type casting utility
- FeatureCutout - Feature regularization
π Further Reading
- End-to-End Learning in Deep Learning - End-to-end learning concepts
- Missing Data Handling - Missing data techniques
- Feature Transformation - Feature transformation methods
- KerasFactory Layer Explorer - Browse all available layers
- Data Preprocessing Tutorial - Complete guide to data preprocessing