π’ AdvancedNumericalEmbedding
π’ AdvancedNumericalEmbedding
π― Overview
The AdvancedNumericalEmbedding layer embeds continuous numerical features into a higher-dimensional space using a sophisticated dual-branch architecture. It combines continuous processing (via MLP) with discrete processing (via learnable binning and embedding lookup) to create rich feature representations.
This layer is particularly powerful for tabular data where numerical features need sophisticated representation learning, combining the benefits of both continuous and discrete processing approaches.
π How It Works
The AdvancedNumericalEmbedding layer processes numerical features through a dual-branch architecture:
- Continuous Branch: Each feature is processed via a small MLP with residual connection
- Discrete Branch: Features are discretized into learnable bins with embedding lookup
- Gating Mechanism: A learnable gate combines both branch outputs per feature
- Residual Connection: Optional batch normalization for training stability
- Output Generation: Produces rich embeddings combining both approaches
graph TD
A[Input Features: batch_size, num_features] --> B[Continuous Branch]
A --> C[Discrete Branch]
B --> D[MLP + ReLU + BatchNorm]
D --> E[Continuous Embeddings]
C --> F[Learnable Binning]
F --> G[Embedding Lookup]
G --> H[Discrete Embeddings]
E --> I[Gating Network]
H --> I
I --> J[Gate Weights]
E --> K[Weighted Combination]
H --> K
J --> K
K --> L[Final Embeddings]
style A fill:#e6f3ff,stroke:#4a86e8
style L fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#f3e5f5,stroke:#9c27b0
π‘ Why Use This Layer?
| Challenge | Traditional Approach | AdvancedNumericalEmbedding's Solution |
|---|---|---|
| Feature Representation | Simple dense layers or one-hot encoding | π― Dual-branch architecture combining continuous and discrete processing |
| Numerical Features | Treat all numerical features uniformly | β‘ Specialized processing for different numerical characteristics |
| Embedding Learning | Separate embedding for categorical only | π§ Unified embedding for both continuous and discrete aspects |
| Feature Interactions | Limited interaction modeling | π Rich interactions through gating and residual connections |
π Use Cases
- Mixed Data Types: Processing both continuous and discrete numerical features
- Feature Engineering: Creating rich embeddings for numerical features
- Representation Learning: Learning sophisticated feature representations
- Tabular Deep Learning: Advanced preprocessing for tabular neural networks
- Transfer Learning: Creating reusable feature embeddings
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
π API Reference
kerasfactory.layers.AdvancedNumericalEmbedding
This module implements an AdvancedNumericalEmbedding layer that embeds continuous numerical features into a higher-dimensional space using a combination of continuous and discrete branches.
Classes
AdvancedNumericalEmbedding
1 2 3 4 5 6 7 8 9 10 11 | |
Advanced numerical embedding layer for continuous features.
This layer embeds each continuous numerical feature into a higher-dimensional space by combining two branches:
- Continuous Branch: Each feature is processed via a small MLP.
- Discrete Branch: Each feature is discretized into bins using learnable min/max boundaries and then an embedding is looked up for its bin.
A learnable gate combines the two branch outputs per feature and per embedding dimension. Additionally, the continuous branch uses a residual connection and optional batch normalization to improve training stability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding_dim |
int
|
Output embedding dimension per feature. |
8
|
mlp_hidden_units |
int
|
Hidden units for the continuous branch MLP. |
16
|
num_bins |
int
|
Number of bins for discretization. |
10
|
init_min |
float or list
|
Initial minimum values for discretization boundaries. If a scalar is provided, it is applied to all features. |
-3.0
|
init_max |
float or list
|
Initial maximum values for discretization boundaries. |
3.0
|
dropout_rate |
float
|
Dropout rate applied to the continuous branch. |
0.1
|
use_batch_norm |
bool
|
Whether to apply batch normalization to the continuous branch. |
True
|
name |
str
|
Name for the layer. |
None
|
Input shape
Tensor with shape: (batch_size, num_features)
Output shape
Tensor with shape: (batch_size, num_features, embedding_dim) or
(batch_size, embedding_dim) if num_features=1
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Initialize the AdvancedNumericalEmbedding layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding_dim |
int
|
Embedding dimension. |
8
|
mlp_hidden_units |
int
|
Hidden units in MLP. |
16
|
num_bins |
int
|
Number of bins for discretization. |
10
|
init_min |
float | list[float]
|
Minimum initialization value. |
-3.0
|
init_max |
float | list[float]
|
Maximum initialization value. |
3.0
|
dropout_rate |
float
|
Dropout rate. |
0.1
|
use_batch_norm |
bool
|
Whether to use batch normalization. |
True
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/AdvancedNumericalEmbedding.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
Functions
1 2 3 | |
Compute the output shape of the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_shape |
tuple[int, ...]
|
Shape of the input tensor. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, ...]
|
Shape of the output tensor. |
Source code in kerasfactory/layers/AdvancedNumericalEmbedding.py
334 335 336 337 338 339 340 341 342 343 344 345 346 | |
π§ Parameters Deep Dive
embedding_dim (int)
- Purpose: Output embedding dimension per feature
- Range: 4 to 128+ (typically 8-64)
- Impact: Higher values = richer representations but more parameters
- Recommendation: Start with 8-16, scale based on data complexity
mlp_hidden_units (int)
- Purpose: Hidden units for the continuous branch MLP
- Range: 8 to 256+ (typically 16-128)
- Impact: Larger values = more complex continuous processing
- Recommendation: Start with 16-32, adjust based on feature complexity
num_bins (int)
- Purpose: Number of bins for discretization
- Range: 5 to 100+ (typically 10-50)
- Impact: More bins = finer discretization but more parameters
- Recommendation: Start with 10-20, increase for high-precision features
init_min / init_max (float or list)
- Purpose: Initial minimum/maximum values for discretization boundaries
- Range: -10.0 to 10.0 (typically -3.0 to 3.0)
- Impact: Affects initial bin boundaries and training stability
- Recommendation: Use -3.0 to 3.0 for normalized data, adjust based on data range
π Performance Characteristics
- Speed: β‘β‘β‘ Fast for small to medium feature counts, scales with embedding_dim
- Memory: πΎπΎπΎ Moderate memory usage due to dual-branch architecture
- Accuracy: π―π―π―π― Excellent for complex numerical feature processing
- Best For: Tabular data with numerical features requiring rich representations
π¨ Examples
Example 1: Mixed Data Type Processing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
Example 2: Financial Data Embedding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
Example 3: Multi-Scale Feature Processing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
π‘ Tips & Best Practices
- Feature Preprocessing: Ensure numerical features are properly normalized
- Bin Count: Use more bins for high-precision features, fewer for general features
- Embedding Dimension: Start with 8-16, scale based on data complexity
- Initialization Range: Set init_min/max based on your data's actual range
- Batch Normalization: Enable for better training stability
- Regularization: Use appropriate dropout to prevent overfitting
β οΈ Common Pitfalls
- Input Shape: Must be 2D tensor (batch_size, num_features)
- Memory Usage: Scales with embedding_dim and num_bins
- Initialization: Poor init_min/max can hurt training - match your data range
- Overfitting: Can overfit on small datasets - use regularization
- Feature Count: Works best with moderate number of features (5-50)
π Related Layers
- DistributionAwareEncoder - Distribution-aware feature encoding
- DistributionTransformLayer - Distribution transformation
- GatedFeatureFusion - Feature fusion mechanism
- TabularAttention - Attention-based feature processing
π Further Reading
- Deep Learning for Tabular Data - Tabular deep learning approaches
- Feature Embedding in Neural Networks - Feature learning concepts
- Numerical Feature Processing - Feature engineering techniques
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering