Skip to content

πŸ”’ AdvancedNumericalEmbedding

πŸ”’ AdvancedNumericalEmbedding

πŸ”΄ Advanced βœ… Stable πŸ”₯ Popular

🎯 Overview

The AdvancedNumericalEmbedding layer embeds continuous numerical features into a higher-dimensional space using a sophisticated dual-branch architecture. It combines continuous processing (via MLP) with discrete processing (via learnable binning and embedding lookup) to create rich feature representations.

This layer is particularly powerful for tabular data where numerical features need sophisticated representation learning, combining the benefits of both continuous and discrete processing approaches.

πŸ” How It Works

The AdvancedNumericalEmbedding layer processes numerical features through a dual-branch architecture:

  1. Continuous Branch: Each feature is processed via a small MLP with residual connection
  2. Discrete Branch: Features are discretized into learnable bins with embedding lookup
  3. Gating Mechanism: A learnable gate combines both branch outputs per feature
  4. Residual Connection: Optional batch normalization for training stability
  5. Output Generation: Produces rich embeddings combining both approaches
graph TD
    A[Input Features: batch_size, num_features] --> B[Continuous Branch]
    A --> C[Discrete Branch]

    B --> D[MLP + ReLU + BatchNorm]
    D --> E[Continuous Embeddings]

    C --> F[Learnable Binning]
    F --> G[Embedding Lookup]
    G --> H[Discrete Embeddings]

    E --> I[Gating Network]
    H --> I
    I --> J[Gate Weights]

    E --> K[Weighted Combination]
    H --> K
    J --> K
    K --> L[Final Embeddings]

    style A fill:#e6f3ff,stroke:#4a86e8
    style L fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style C fill:#f3e5f5,stroke:#9c27b0

πŸ’‘ Why Use This Layer?

Challenge Traditional Approach AdvancedNumericalEmbedding's Solution
Feature Representation Simple dense layers or one-hot encoding 🎯 Dual-branch architecture combining continuous and discrete processing
Numerical Features Treat all numerical features uniformly ⚑ Specialized processing for different numerical characteristics
Embedding Learning Separate embedding for categorical only 🧠 Unified embedding for both continuous and discrete aspects
Feature Interactions Limited interaction modeling πŸ”— Rich interactions through gating and residual connections

πŸ“Š Use Cases

  • Mixed Data Types: Processing both continuous and discrete numerical features
  • Feature Engineering: Creating rich embeddings for numerical features
  • Representation Learning: Learning sophisticated feature representations
  • Tabular Deep Learning: Advanced preprocessing for tabular neural networks
  • Transfer Learning: Creating reusable feature embeddings

πŸš€ Quick Start

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import keras
from kerasfactory.layers import AdvancedNumericalEmbedding

# Create sample input data
batch_size, num_features = 32, 5
x = keras.random.normal((batch_size, num_features))

# Apply advanced numerical embedding
embedding = AdvancedNumericalEmbedding(
    embedding_dim=8,
    mlp_hidden_units=16,
    num_bins=10
)
embedded = embedding(x)

print(f"Input shape: {x.shape}")           # (32, 5)
print(f"Output shape: {embedded.shape}")   # (32, 5, 8)

In a Sequential Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import keras
from kerasfactory.layers import AdvancedNumericalEmbedding

model = keras.Sequential([
    AdvancedNumericalEmbedding(
        embedding_dim=16,
        mlp_hidden_units=32,
        num_bins=20
    ),
    keras.layers.Flatten(),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import keras
from kerasfactory.layers import AdvancedNumericalEmbedding

# Define inputs
inputs = keras.Input(shape=(10,))  # 10 numerical features

# Apply advanced embedding
x = AdvancedNumericalEmbedding(
    embedding_dim=32,
    mlp_hidden_units=64,
    num_bins=15
)(inputs)

# Flatten and continue processing
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(128, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Advanced configuration with custom parameters
embedding = AdvancedNumericalEmbedding(
    embedding_dim=64,           # Higher embedding dimension
    mlp_hidden_units=128,       # More hidden units
    num_bins=50,                # More bins for finer discretization
    init_min=-5.0,              # Custom initialization range
    init_max=5.0,
    dropout_rate=0.2,           # Higher dropout for regularization
    use_batch_norm=True,        # Enable batch normalization
    name="custom_advanced_embedding"
)

# Use in a complex model
inputs = keras.Input(shape=(20,))
x = embedding(inputs)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(256, activation='relu')(x)
x = keras.layers.Dropout(0.3)(x)
x = keras.layers.Dense(64, activation='relu')(x)
outputs = keras.layers.Dense(5, activation='softmax')(x)

model = keras.Model(inputs, outputs)

πŸ“– API Reference

kerasfactory.layers.AdvancedNumericalEmbedding

This module implements an AdvancedNumericalEmbedding layer that embeds continuous numerical features into a higher-dimensional space using a combination of continuous and discrete branches.

Classes

AdvancedNumericalEmbedding
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
AdvancedNumericalEmbedding(
    embedding_dim: int = 8,
    mlp_hidden_units: int = 16,
    num_bins: int = 10,
    init_min: float | list[float] = -3.0,
    init_max: float | list[float] = 3.0,
    dropout_rate: float = 0.1,
    use_batch_norm: bool = True,
    name: str | None = None,
    **kwargs: Any
)

Advanced numerical embedding layer for continuous features.

This layer embeds each continuous numerical feature into a higher-dimensional space by combining two branches:

  1. Continuous Branch: Each feature is processed via a small MLP.
  2. Discrete Branch: Each feature is discretized into bins using learnable min/max boundaries and then an embedding is looked up for its bin.

A learnable gate combines the two branch outputs per feature and per embedding dimension. Additionally, the continuous branch uses a residual connection and optional batch normalization to improve training stability.

Parameters:

Name Type Description Default
embedding_dim int

Output embedding dimension per feature.

8
mlp_hidden_units int

Hidden units for the continuous branch MLP.

16
num_bins int

Number of bins for discretization.

10
init_min float or list

Initial minimum values for discretization boundaries. If a scalar is provided, it is applied to all features.

-3.0
init_max float or list

Initial maximum values for discretization boundaries.

3.0
dropout_rate float

Dropout rate applied to the continuous branch.

0.1
use_batch_norm bool

Whether to apply batch normalization to the continuous branch.

True
name str

Name for the layer.

None
Input shape

Tensor with shape: (batch_size, num_features)

Output shape

Tensor with shape: (batch_size, num_features, embedding_dim) or (batch_size, embedding_dim) if num_features=1

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import keras
from kerasfactory.layers import AdvancedNumericalEmbedding

# Create sample input data
x = keras.random.normal((32, 5))  # 32 samples, 5 features

# Create the layer
embedding = AdvancedNumericalEmbedding(
    embedding_dim=8,
    mlp_hidden_units=16,
    num_bins=10
)
y = embedding(x)
print("Output shape:", y.shape)  # (32, 5, 8)

Initialize the AdvancedNumericalEmbedding layer.

Parameters:

Name Type Description Default
embedding_dim int

Embedding dimension.

8
mlp_hidden_units int

Hidden units in MLP.

16
num_bins int

Number of bins for discretization.

10
init_min float | list[float]

Minimum initialization value.

-3.0
init_max float | list[float]

Maximum initialization value.

3.0
dropout_rate float

Dropout rate.

0.1
use_batch_norm bool

Whether to use batch normalization.

True
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/AdvancedNumericalEmbedding.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def __init__(
    self,
    embedding_dim: int = 8,
    mlp_hidden_units: int = 16,
    num_bins: int = 10,
    init_min: float | list[float] = -3.0,
    init_max: float | list[float] = 3.0,
    dropout_rate: float = 0.1,
    use_batch_norm: bool = True,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the AdvancedNumericalEmbedding layer.

    Args:
        embedding_dim: Embedding dimension.
        mlp_hidden_units: Hidden units in MLP.
        num_bins: Number of bins for discretization.
        init_min: Minimum initialization value.
        init_max: Maximum initialization value.
        dropout_rate: Dropout rate.
        use_batch_norm: Whether to use batch normalization.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._embedding_dim = embedding_dim
    self._mlp_hidden_units = mlp_hidden_units
    self._num_bins = num_bins
    self._init_min = init_min
    self._init_max = init_max
    self._dropout_rate = dropout_rate
    self._use_batch_norm = use_batch_norm

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.embedding_dim = self._embedding_dim
    self.mlp_hidden_units = self._mlp_hidden_units
    self.num_bins = self._num_bins
    self.init_min = self._init_min
    self.init_max = self._init_max
    self.dropout_rate = self._dropout_rate
    self.use_batch_norm = self._use_batch_norm

    # Initialize instance variables
    self.num_features: int | None = None
    self.hidden_layer: layers.Dense | None = None
    self.output_layer: layers.Dense | None = None
    self.dropout_layer: layers.Dropout | None = None
    self.batch_norm: layers.BatchNormalization | None = None
    self.residual_proj: layers.Dense | None = None
    self.bin_embeddings: list[layers.Embedding] = []
    self.learned_min: layers.Embedding | None = None
    self.learned_max: layers.Embedding | None = None
    self.gate: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Shape of the input tensor.

required

Returns:

Type Description
tuple[int, ...]

Shape of the output tensor.

Source code in kerasfactory/layers/AdvancedNumericalEmbedding.py
334
335
336
337
338
339
340
341
342
343
344
345
346
def compute_output_shape(self, input_shape: tuple[int, ...]) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor.

    Returns:
        Shape of the output tensor.
    """
    if self.num_features == 1:
        return input_shape[:-1] + (self.embedding_dim,)
    else:
        return input_shape[:-1] + (self.num_features, self.embedding_dim)

πŸ”§ Parameters Deep Dive

embedding_dim (int)

  • Purpose: Output embedding dimension per feature
  • Range: 4 to 128+ (typically 8-64)
  • Impact: Higher values = richer representations but more parameters
  • Recommendation: Start with 8-16, scale based on data complexity

mlp_hidden_units (int)

  • Purpose: Hidden units for the continuous branch MLP
  • Range: 8 to 256+ (typically 16-128)
  • Impact: Larger values = more complex continuous processing
  • Recommendation: Start with 16-32, adjust based on feature complexity

num_bins (int)

  • Purpose: Number of bins for discretization
  • Range: 5 to 100+ (typically 10-50)
  • Impact: More bins = finer discretization but more parameters
  • Recommendation: Start with 10-20, increase for high-precision features

init_min / init_max (float or list)

  • Purpose: Initial minimum/maximum values for discretization boundaries
  • Range: -10.0 to 10.0 (typically -3.0 to 3.0)
  • Impact: Affects initial bin boundaries and training stability
  • Recommendation: Use -3.0 to 3.0 for normalized data, adjust based on data range

πŸ“ˆ Performance Characteristics

  • Speed: ⚑⚑⚑ Fast for small to medium feature counts, scales with embedding_dim
  • Memory: πŸ’ΎπŸ’ΎπŸ’Ύ Moderate memory usage due to dual-branch architecture
  • Accuracy: 🎯🎯🎯🎯 Excellent for complex numerical feature processing
  • Best For: Tabular data with numerical features requiring rich representations

🎨 Examples

Example 1: Mixed Data Type Processing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import keras
import numpy as np
from kerasfactory.layers import AdvancedNumericalEmbedding

# Simulate mixed numerical data
batch_size = 1000

# Continuous features (age, income, etc.)
continuous_features = np.random.normal(0, 1, (batch_size, 5))

# Discrete-like features (counts, ratings, etc.)
discrete_features = np.random.randint(0, 10, (batch_size, 3))

# Combine features
numerical_data = np.concatenate([continuous_features, discrete_features], axis=1)

# Build model with advanced embedding
inputs = keras.Input(shape=(8,))  # 8 numerical features

# Apply advanced numerical embedding
x = AdvancedNumericalEmbedding(
    embedding_dim=16,
    mlp_hidden_units=32,
    num_bins=20,
    init_min=-3.0,
    init_max=3.0
)(inputs)

# Process embeddings
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(64, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(32, activation='relu')(x)
output = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, output)
model.compile(optimizer='adam', loss='binary_crossentropy')

Example 2: Financial Data Embedding

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Process financial data with advanced numerical embedding
def create_financial_model():
    inputs = keras.Input(shape=(15,))  # 15 financial features

    # Advanced numerical embedding
    x = AdvancedNumericalEmbedding(
        embedding_dim=32,
        mlp_hidden_units=64,
        num_bins=25,
        init_min=-5.0,
        init_max=5.0,
        dropout_rate=0.1
    )(inputs)

    # Process embeddings
    x = keras.layers.Flatten()(x)
    x = keras.layers.Dense(128, activation='relu')(x)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.Dropout(0.3)(x)

    # Multiple outputs
    risk_score = keras.layers.Dense(1, activation='sigmoid', name='risk')(x)
    category = keras.layers.Dense(5, activation='softmax', name='category')(x)

    return keras.Model(inputs, [risk_score, category])

model = create_financial_model()
model.compile(
    optimizer='adam',
    loss={'risk': 'binary_crossentropy', 'category': 'categorical_crossentropy'},
    loss_weights={'risk': 1.0, 'category': 0.5}
)

Example 3: Multi-Scale Feature Processing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Process features at different scales with advanced embedding
def create_multi_scale_model():
    inputs = keras.Input(shape=(20,))

    # Different embedding configurations for different feature groups
    # Group 1: High-precision features (0-5)
    high_precision = inputs[:, :5]
    high_precision_emb = AdvancedNumericalEmbedding(
        embedding_dim=16,
        mlp_hidden_units=32,
        num_bins=50,  # More bins for high precision
        init_min=0.0,
        init_max=5.0
    )(high_precision)

    # Group 2: General features (5-15)
    general_features = inputs[:, 5:15]
    general_emb = AdvancedNumericalEmbedding(
        embedding_dim=24,
        mlp_hidden_units=48,
        num_bins=20,
        init_min=-3.0,
        init_max=3.0
    )(general_features)

    # Group 3: Categorical-like features (15-20)
    categorical_like = inputs[:, 15:20]
    categorical_emb = AdvancedNumericalEmbedding(
        embedding_dim=12,
        mlp_hidden_units=24,
        num_bins=10,
        init_min=0.0,
        init_max=10.0
    )(categorical_like)

    # Combine all embeddings
    all_embeddings = keras.layers.Concatenate()([
        keras.layers.Flatten()(high_precision_emb),
        keras.layers.Flatten()(general_emb),
        keras.layers.Flatten()(categorical_emb)
    ])

    # Final processing
    x = keras.layers.Dense(128, activation='relu')(all_embeddings)
    x = keras.layers.Dropout(0.3)(x)
    x = keras.layers.Dense(64, activation='relu')(x)
    output = keras.layers.Dense(1, activation='sigmoid')(x)

    return keras.Model(inputs, output)

model = create_multi_scale_model()
model.compile(optimizer='adam', loss='binary_crossentropy')

πŸ’‘ Tips & Best Practices

  • Feature Preprocessing: Ensure numerical features are properly normalized
  • Bin Count: Use more bins for high-precision features, fewer for general features
  • Embedding Dimension: Start with 8-16, scale based on data complexity
  • Initialization Range: Set init_min/max based on your data's actual range
  • Batch Normalization: Enable for better training stability
  • Regularization: Use appropriate dropout to prevent overfitting

⚠️ Common Pitfalls

  • Input Shape: Must be 2D tensor (batch_size, num_features)
  • Memory Usage: Scales with embedding_dim and num_bins
  • Initialization: Poor init_min/max can hurt training - match your data range
  • Overfitting: Can overfit on small datasets - use regularization
  • Feature Count: Works best with moderate number of features (5-50)

πŸ“š Further Reading