🔧 DifferentiableTabularPreprocessor

🔴 Advanced ✅ Stable 🔥 Popular

🎯 Overview

The DifferentiableTabularPreprocessor integrates preprocessing into the model so that optimal imputation and normalization parameters are learned end-to-end. This approach is particularly useful for tabular data with missing values and features that need normalization.

This layer replaces missing values with learnable imputation vectors and applies learned affine transformations (scaling and shifting) to each feature, making the entire preprocessing pipeline differentiable.

🔍 How It Works

The DifferentiableTabularPreprocessor processes tabular data through learnable preprocessing:

Missing Value Detection: Identifies NaN values in input data
Learnable Imputation: Replaces missing values with learned imputation vectors
Affine Transformation: Applies learned scaling (gamma) and shifting (beta) to each feature
End-to-End Learning: All parameters are learned jointly with the model
Output Generation: Produces preprocessed features ready for downstream processing

graph TD
    A[Input Features with NaNs] --> B[Missing Value Detection]
    B --> C[Learnable Imputation]
    C --> D[Affine Transformation]
    D --> E[Gamma Scaling]
    E --> F[Beta Shifting]
    F --> G[Preprocessed Features]

    H[Learnable Imputation Vector] --> C
    I[Learnable Gamma Parameters] --> E
    J[Learnable Beta Parameters] --> F

    style A fill:#e6f3ff,stroke:#4a86e8
    style G fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style C fill:#f3e5f5,stroke:#9c27b0
    style D fill:#e1f5fe,stroke:#03a9f4

💡 Why Use This Layer?

Challenge	Traditional Approach	DifferentiableTabularPreprocessor's Solution
Missing Values	Separate imputation step (mean, median, etc.)	🎯 Learnable imputation optimized for the task
Feature Scaling	Static normalization (z-score, min-max)	⚡ Learned scaling adapted to data and task
End-to-End Learning	Separate preprocessing and modeling	🧠 Integrated preprocessing learned jointly
Data Quality	Fixed preprocessing strategies	🔗 Adaptive preprocessing that improves with training

📊 Use Cases

Missing Data Handling: Intelligent imputation of missing values
Feature Normalization: Learned scaling and shifting of features
End-to-End Learning: Integrated preprocessing and modeling
Tabular Deep Learning: Advanced preprocessing for tabular neural networks
Data Quality: Adaptive preprocessing that improves with training

🚀 Quick Start

Basic Usage

import keras
import numpy as np
from kerasfactory.layers import DifferentiableTabularPreprocessor

# Create sample data with missing values
x = keras.ops.convert_to_tensor([
    [1.0, np.nan, 3.0, 4.0, 5.0],
    [2.0, 2.0, np.nan, 4.0, 5.0],
    [np.nan, 2.0, 3.0, 4.0, np.nan]
], dtype="float32")

# Apply differentiable preprocessing
preprocessor = DifferentiableTabularPreprocessor(num_features=5)
preprocessed = preprocessor(x)

print(f"Input shape: {x.shape}")           # (3, 5)
print(f"Output shape: {preprocessed.shape}")  # (3, 5)
print(f"Has NaNs: {keras.ops.any(keras.ops.isnan(preprocessed))}")  # False

In a Sequential Model

import keras
from kerasfactory.layers import DifferentiableTabularPreprocessor

model = keras.Sequential([
    DifferentiableTabularPreprocessor(num_features=10),  # Preprocess first
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

import keras
from kerasfactory.layers import DifferentiableTabularPreprocessor

# Define inputs
inputs = keras.Input(shape=(15,))  # 15 features

# Apply differentiable preprocessing
x = DifferentiableTabularPreprocessor(num_features=15)(inputs)

# Continue processing
x = keras.layers.Dense(64, activation='relu')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

# Advanced configuration with custom preprocessing
def create_advanced_preprocessing_model():
    inputs = keras.Input(shape=(20,))

    # Apply differentiable preprocessing
    x = DifferentiableTabularPreprocessor(
        num_features=20,
        name="learnable_preprocessing"
    )(inputs)

    # Multi-branch processing
    branch1 = keras.layers.Dense(32, activation='relu')(x)
    branch1 = keras.layers.Dense(16, activation='relu')(branch1)

    branch2 = keras.layers.Dense(32, activation='tanh')(x)
    branch2 = keras.layers.Dense(16, activation='tanh')(branch2)

    # Combine branches
    x = keras.layers.Concatenate()([branch1, branch2])
    x = keras.layers.Dense(64, activation='relu')(x)
    x = keras.layers.Dropout(0.3)(x)

    # Multi-task output
    classification = keras.layers.Dense(3, activation='softmax', name='classification')(x)
    regression = keras.layers.Dense(1, name='regression')(x)

    return keras.Model(inputs, [classification, regression])

model = create_advanced_preprocessing_model()
model.compile(
    optimizer='adam',
    loss={'classification': 'categorical_crossentropy', 'regression': 'mse'},
    loss_weights={'classification': 1.0, 'regression': 0.5}
)

📖 API Reference

kerasfactory.layers.DifferentiableTabularPreprocessor

This module implements a DifferentiableTabularPreprocessor layer that integrates preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end. This approach is useful for tabular data with missing values and features that need normalization.

Classes

DifferentiableTabularPreprocessor

DifferentiableTabularPreprocessor(
    num_features: int,
    name: str | None = None,
    **kwargs: Any
)

A differentiable preprocessing layer for numeric tabular data.

This layer

Replaces missing values (NaNs) with a learnable imputation vector.
Applies a learned affine transformation (scaling and shifting) to each feature.

The idea is to integrate preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of numeric features in the input.	required
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
import numpy as np
from kerasfactory.layers import DifferentiableTabularPreprocessor

# Suppose we have tabular data with 5 numeric features
x = keras.ops.convert_to_tensor([
    [1.0, np.nan, 3.0, 4.0, 5.0],
    [2.0, 2.0, np.nan, 4.0, 5.0]
], dtype="float32")

preproc = DifferentiableTabularPreprocessor(num_features=5)
y = preproc(x)
print(y)

Initialize the DifferentiableTabularPreprocessor.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of input features.	required
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Source code in kerasfactory/layers/DifferentiableTabularPreprocessor.py

def __init__(
    self,
    num_features: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DifferentiableTabularPreprocessor.

    Args:
        num_features: Number of input features.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.num_features = num_features

    # Initialize instance variables
    self.impute = None
    self.gamma = None
    self.beta = None

    # Validate parameters during initialization
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🔧 Parameters Deep Dive

`num_features` (int)

Purpose: Number of numeric features in the input
Range: 1 to 1000+ (typically 5-100)
Impact: Must match the last dimension of your input tensor
Recommendation: Set to the number of features in your dataset

📈 Performance Characteristics

Speed: ⚡⚡⚡ Fast - simple mathematical operations
Memory: 💾💾 Low memory usage - minimal additional parameters
Accuracy: 🎯🎯🎯🎯 Excellent for handling missing values and normalization
Best For: Tabular data with missing values requiring end-to-end learning

🎨 Examples

Example 1: Missing Data Handling

import keras
import numpy as np
from kerasfactory.layers import DifferentiableTabularPreprocessor

# Create data with different missing patterns
def create_missing_data():
    np.random.seed(42)
    n_samples, n_features = 1000, 8

    # Create base data
    data = np.random.normal(0, 1, (n_samples, n_features))

    # Introduce missing values with different patterns
    # Random missing
    random_mask = np.random.random((n_samples, n_features)) < 0.1
    data[random_mask] = np.nan

    # Column-specific missing (some columns have more missing values)
    data[:, 2][np.random.random(n_samples) < 0.3] = np.nan  # 30% missing in column 2
    data[:, 5][np.random.random(n_samples) < 0.2] = np.nan  # 20% missing in column 5

    return data

# Create and preprocess data
missing_data = create_missing_data()
print(f"Missing data shape: {missing_data.shape}")
print(f"Missing values per column: {np.isnan(missing_data).sum(axis=0)}")

# Build model with differentiable preprocessing
inputs = keras.Input(shape=(8,))
x = DifferentiableTabularPreprocessor(num_features=8)(inputs)
x = keras.layers.Dense(32, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(16, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='binary_crossentropy')

# Test preprocessing
test_data = keras.ops.convert_to_tensor(missing_data[:10], dtype="float32")
preprocessed = model.layers[0](test_data)
print(f"Preprocessed shape: {preprocessed.shape}")
print(f"Has NaNs after preprocessing: {keras.ops.any(keras.ops.isnan(preprocessed))}")

Example 2: Feature-Specific Preprocessing

# Analyze learned preprocessing parameters
def analyze_preprocessing_parameters(model):
    """Analyze the learned preprocessing parameters."""
    preprocessor = model.layers[0]  # First layer is the preprocessor

    # Get learned parameters
    imputation_values = preprocessor.impute.numpy()
    gamma_values = preprocessor.gamma.numpy()
    beta_values = preprocessor.beta.numpy()

    print("Learned Preprocessing Parameters:")
    print("=" * 50)

    for i in range(len(imputation_values)):
        print(f"Feature {i+1}:")
        print(f"  Imputation value: {imputation_values[i]:.4f}")
        print(f"  Gamma (scaling): {gamma_values[i]:.4f}")
        print(f"  Beta (shifting): {beta_values[i]:.4f}")
        print()

    return {
        'imputation': imputation_values,
        'gamma': gamma_values,
        'beta': beta_values
    }

# Use with your trained model
# params = analyze_preprocessing_parameters(model)

Example 3: Comparison with Traditional Preprocessing

# Compare with traditional preprocessing methods
def compare_preprocessing_methods():
    # Create data with missing values
    data = np.random.normal(0, 1, (100, 5))
    data[data < -1] = np.nan  # Introduce missing values

    # Traditional preprocessing
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import StandardScaler

    # Impute missing values
    imputer = SimpleImputer(strategy='mean')
    data_imputed = imputer.fit_transform(data)

    # Standardize
    scaler = StandardScaler()
    data_traditional = scaler.fit_transform(data_imputed)

    # Differentiable preprocessing
    inputs = keras.Input(shape=(5,))
    x = DifferentiableTabularPreprocessor(num_features=5)(inputs)
    model = keras.Model(inputs, x)

    # Apply differentiable preprocessing
    data_differentiable = model(keras.ops.convert_to_tensor(data, dtype="float32"))

    print("Traditional Preprocessing:")
    print(f"  Mean: {np.mean(data_traditional, axis=0)}")
    print(f"  Std: {np.std(data_traditional, axis=0)}")

    print("\nDifferentiable Preprocessing:")
    print(f"  Mean: {keras.ops.mean(data_differentiable, axis=0).numpy()}")
    print(f"  Std: {keras.ops.std(data_differentiable, axis=0).numpy()}")

    return data_traditional, data_differentiable.numpy()

# Compare methods
# traditional, differentiable = compare_preprocessing_methods()

💡 Tips & Best Practices

Feature Count: Must match the number of features in your dataset
Missing Values: Works best with moderate amounts of missing data
Initialization: Parameters are initialized to reasonable defaults
End-to-End Learning: Let the model learn optimal preprocessing parameters
Monitoring: Track learned parameters to understand preprocessing behavior
Combination: Use with other preprocessing layers for complex pipelines

⚠️ Common Pitfalls

Input Shape: Must be 2D tensor (batch_size, num_features)
Feature Mismatch: num_features must match input dimension
NaN Handling: Only handles NaN values, not other missing value representations
Memory Usage: Creates learnable parameters for each feature
Overfitting: Can overfit on small datasets with many features

DifferentialPreprocessingLayer - Advanced differential preprocessing
DistributionTransformLayer - Distribution transformation
CastToFloat32Layer - Type casting utility
FeatureCutout - Feature regularization

📚 Further Reading

End-to-End Learning in Deep Learning - End-to-end learning concepts
Missing Data Handling - Missing data techniques
Feature Normalization - Feature scaling methods
KerasFactory Layer Explorer - Browse all available layers
Data Preprocessing Tutorial - Complete guide to data preprocessing

🔧 DifferentiableTabularPreprocessor

🔧 DifferentiableTabularPreprocessor

🎯 Overview

🔍 How It Works

💡 Why Use This Layer?

📊 Use Cases

🚀 Quick Start

Basic Usage

In a Sequential Model

In a Functional Model

Advanced Configuration

📖 API Reference

kerasfactory.layers.DifferentiableTabularPreprocessor

Classes

DifferentiableTabularPreprocessor

🔧 Parameters Deep Dive

num_features (int)

📈 Performance Characteristics

🎨 Examples

Example 1: Missing Data Handling

Example 2: Feature-Specific Preprocessing

Example 3: Comparison with Traditional Preprocessing

💡 Tips & Best Practices

⚠️ Common Pitfalls

🔗 Related Layers

📚 Further Reading

`num_features` (int)