π§ DifferentiableTabularPreprocessor
π§ DifferentiableTabularPreprocessor
π― Overview
The DifferentiableTabularPreprocessor integrates preprocessing into the model so that optimal imputation and normalization parameters are learned end-to-end. This approach is particularly useful for tabular data with missing values and features that need normalization.
This layer replaces missing values with learnable imputation vectors and applies learned affine transformations (scaling and shifting) to each feature, making the entire preprocessing pipeline differentiable.
π How It Works
The DifferentiableTabularPreprocessor processes tabular data through learnable preprocessing:
- Missing Value Detection: Identifies NaN values in input data
- Learnable Imputation: Replaces missing values with learned imputation vectors
- Affine Transformation: Applies learned scaling (gamma) and shifting (beta) to each feature
- End-to-End Learning: All parameters are learned jointly with the model
- Output Generation: Produces preprocessed features ready for downstream processing
graph TD
A[Input Features with NaNs] --> B[Missing Value Detection]
B --> C[Learnable Imputation]
C --> D[Affine Transformation]
D --> E[Gamma Scaling]
E --> F[Beta Shifting]
F --> G[Preprocessed Features]
H[Learnable Imputation Vector] --> C
I[Learnable Gamma Parameters] --> E
J[Learnable Beta Parameters] --> F
style A fill:#e6f3ff,stroke:#4a86e8
style G fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#f3e5f5,stroke:#9c27b0
style D fill:#e1f5fe,stroke:#03a9f4
π‘ Why Use This Layer?
| Challenge | Traditional Approach | DifferentiableTabularPreprocessor's Solution |
|---|---|---|
| Missing Values | Separate imputation step (mean, median, etc.) | π― Learnable imputation optimized for the task |
| Feature Scaling | Static normalization (z-score, min-max) | β‘ Learned scaling adapted to data and task |
| End-to-End Learning | Separate preprocessing and modeling | π§ Integrated preprocessing learned jointly |
| Data Quality | Fixed preprocessing strategies | π Adaptive preprocessing that improves with training |
π Use Cases
- Missing Data Handling: Intelligent imputation of missing values
- Feature Normalization: Learned scaling and shifting of features
- End-to-End Learning: Integrated preprocessing and modeling
- Tabular Deep Learning: Advanced preprocessing for tabular neural networks
- Data Quality: Adaptive preprocessing that improves with training
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
π API Reference
kerasfactory.layers.DifferentiableTabularPreprocessor
This module implements a DifferentiableTabularPreprocessor layer that integrates preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end. This approach is useful for tabular data with missing values and features that need normalization.
Classes
DifferentiableTabularPreprocessor
1 2 3 4 5 | |
A differentiable preprocessing layer for numeric tabular data.
This layer
- Replaces missing values (NaNs) with a learnable imputation vector.
- Applies a learned affine transformation (scaling and shifting) to each feature.
The idea is to integrate preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_features |
int
|
Number of numeric features in the input. |
required |
name |
str | None
|
Optional name for the layer. |
None
|
Input shape
2D tensor with shape: (batch_size, num_features)
Output shape
2D tensor with shape: (batch_size, num_features) (same as input)
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Initialize the DifferentiableTabularPreprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_features |
int
|
Number of input features. |
required |
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/DifferentiableTabularPreprocessor.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
π§ Parameters Deep Dive
num_features (int)
- Purpose: Number of numeric features in the input
- Range: 1 to 1000+ (typically 5-100)
- Impact: Must match the last dimension of your input tensor
- Recommendation: Set to the number of features in your dataset
π Performance Characteristics
- Speed: β‘β‘β‘ Fast - simple mathematical operations
- Memory: πΎπΎ Low memory usage - minimal additional parameters
- Accuracy: π―π―π―π― Excellent for handling missing values and normalization
- Best For: Tabular data with missing values requiring end-to-end learning
π¨ Examples
Example 1: Missing Data Handling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
Example 2: Feature-Specific Preprocessing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Example 3: Comparison with Traditional Preprocessing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
π‘ Tips & Best Practices
- Feature Count: Must match the number of features in your dataset
- Missing Values: Works best with moderate amounts of missing data
- Initialization: Parameters are initialized to reasonable defaults
- End-to-End Learning: Let the model learn optimal preprocessing parameters
- Monitoring: Track learned parameters to understand preprocessing behavior
- Combination: Use with other preprocessing layers for complex pipelines
β οΈ Common Pitfalls
- Input Shape: Must be 2D tensor (batch_size, num_features)
- Feature Mismatch: num_features must match input dimension
- NaN Handling: Only handles NaN values, not other missing value representations
- Memory Usage: Creates learnable parameters for each feature
- Overfitting: Can overfit on small datasets with many features
π Related Layers
- DifferentialPreprocessingLayer - Advanced differential preprocessing
- DistributionTransformLayer - Distribution transformation
- CastToFloat32Layer - Type casting utility
- FeatureCutout - Feature regularization
π Further Reading
- End-to-End Learning in Deep Learning - End-to-end learning concepts
- Missing Data Handling - Missing data techniques
- Feature Normalization - Feature scaling methods
- KerasFactory Layer Explorer - Browse all available layers
- Data Preprocessing Tutorial - Complete guide to data preprocessing