π MultiResolutionTabularAttention
π MultiResolutionTabularAttention
π― Overview
The MultiResolutionTabularAttention layer is a sophisticated attention mechanism designed specifically for mixed-type tabular data. Unlike standard attention layers that treat all features uniformly, this layer recognizes that numerical and categorical features have fundamentally different characteristics and require specialized processing.
This layer implements separate attention mechanisms for numerical and categorical features, along with cross-attention between them, enabling the model to learn optimal representations for each data type while capturing their interactions.
π How It Works
The MultiResolutionTabularAttention processes mixed-type tabular data through specialized attention pathways:
- Numerical Feature Processing: Dedicated attention for continuous numerical features
- Categorical Feature Processing: Specialized attention for discrete categorical features
- Cross-Attention: Bidirectional attention between numerical and categorical features
- Feature Fusion: Intelligent combination of both feature types
graph TD
A[Numerical Features] --> B[Numerical Projection]
C[Categorical Features] --> D[Categorical Projection]
B --> E[Numerical Self-Attention]
D --> F[Categorical Self-Attention]
E --> G[Numerical Cross-Attention]
F --> H[Categorical Cross-Attention]
G --> I[Numerical LayerNorm + Residual]
H --> J[Categorical LayerNorm + Residual]
I --> K[Numerical Output]
J --> L[Categorical Output]
style A fill:#e6f3ff,stroke:#4a86e8
style C fill:#fff9e6,stroke:#ffb74d
style K fill:#e8f5e9,stroke:#66bb6a
style L fill:#e8f5e9,stroke:#66bb6a
π‘ Why Use This Layer?
| Challenge | Traditional Approach | MultiResolutionTabularAttention's Solution |
|---|---|---|
| Mixed Data Types | Treat all features the same way | π― Specialized processing for numerical vs categorical features |
| Feature Interactions | Simple concatenation or basic attention | π Cross-attention between different feature types |
| Information Loss | One-size-fits-all representations | π Preserved semantics of each data type |
| Complex Relationships | Limited cross-type learning | π§ Rich interactions between numerical and categorical features |
π Use Cases
- Customer Analytics: Combining numerical metrics (age, income) with categorical data (region, product category)
- Medical Diagnosis: Processing lab values (numerical) alongside symptoms and demographics (categorical)
- E-commerce: Analyzing purchase amounts and quantities (numerical) with product categories and user segments (categorical)
- Financial Modeling: Combining market indicators (numerical) with sector classifications and risk categories (categorical)
- Survey Analysis: Processing rating scales (numerical) with demographic and preference data (categorical)
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
π API Reference
kerasfactory.layers.MultiResolutionTabularAttention
This module implements a MultiResolutionTabularAttention layer that applies separate attention mechanisms for numerical and categorical features, along with cross-attention between them. It's particularly useful for mixed-type tabular data.
Classes
MultiResolutionTabularAttention
1 2 3 4 5 6 7 | |
Custom layer to apply multi-resolution attention for mixed-type tabular data.
This layer implements separate attention mechanisms for numerical and categorical features, along with cross-attention between them. It's designed to handle the different characteristics of numerical and categorical features in tabular data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_heads |
int
|
Number of attention heads |
required |
d_model |
int
|
Dimensionality of the attention model |
required |
dropout_rate |
float
|
Dropout rate for regularization |
0.1
|
name |
str
|
Name for the layer |
None
|
Input shape
List of two tensors:
- Numerical features: (batch_size, num_samples, num_numerical_features)
- Categorical features: (batch_size, num_samples, num_categorical_features)
Output shape
List of two tensors with shapes:
- (batch_size, num_samples, d_model) (numerical features)
- (batch_size, num_samples, d_model) (categorical features)
Example
1 2 3 4 5 6 7 8 9 10 11 12 | |
Initialize the MultiResolutionTabularAttention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_heads |
int
|
Number of attention heads. |
required |
d_model |
int
|
Model dimension. |
required |
dropout_rate |
float
|
Dropout rate. |
0.1
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/MultiResolutionTabularAttention.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
Functions
1 2 3 | |
Compute the output shape of the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_shape |
list[tuple[int, ...]]
|
List of shapes of the input tensors. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[int, ...]]
|
List of shapes of the output tensors. |
Source code in kerasfactory/layers/MultiResolutionTabularAttention.py
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
π§ Parameters Deep Dive
num_heads (int)
- Purpose: Number of attention heads for parallel processing
- Range: 1 to 32+ (typically 4, 8, or 16)
- Impact: More heads = better cross-type pattern recognition
- Recommendation: Start with 8, increase for complex mixed-type interactions
d_model (int)
- Purpose: Dimensionality of the attention model
- Range: 32 to 512+ (must be divisible by num_heads)
- Impact: Higher values = richer cross-type representations
- Recommendation: Start with 64-128, scale based on data complexity
dropout_rate (float)
- Purpose: Regularization to prevent overfitting
- Range: 0.0 to 0.9
- Impact: Higher values = more regularization for complex interactions
- Recommendation: Start with 0.1-0.2, adjust based on overfitting
π Performance Characteristics
- Speed: β‘β‘ Fast for small to medium datasets, scales with feature complexity
- Memory: πΎπΎπΎ Higher memory usage due to dual attention mechanisms
- Accuracy: π―π―π―π― Excellent for mixed-type tabular data with complex interactions
- Best For: Mixed-type tabular data requiring specialized feature processing
π¨ Examples
Example 1: E-commerce Recommendation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
Example 2: Medical Diagnosis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Example 3: Financial Risk Assessment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
π‘ Tips & Best Practices
- Data Preprocessing: Ensure numerical features are normalized and categorical features are properly encoded
- Feature Balance: Maintain reasonable balance between numerical and categorical feature counts
- Head Configuration: Use more attention heads for complex cross-type interactions
- Regularization: Apply appropriate dropout to prevent overfitting in cross-attention
- Output Processing: Consider different pooling strategies for different feature types
- Monitoring: Track attention weights to understand cross-type learning
β οΈ Common Pitfalls
- Input Format: Must provide exactly two inputs: [numerical_features, categorical_features]
- Shape Mismatch: Ensure both inputs have the same batch_size and num_samples dimensions
- Memory Usage: Higher memory consumption due to dual attention mechanisms
- Overfitting: Complex cross-attention can lead to overfitting on small datasets
- Feature Imbalance: Severe imbalance between feature types can hurt performance
π Related Layers
- TabularAttention - General tabular attention for uniform feature processing
- ColumnAttention - Column-wise attention for feature relationships
- AdvancedNumericalEmbedding - Specialized numerical feature processing
- DistributionAwareEncoder - Distribution-aware feature encoding
π Further Reading
- TabNet: Attentive Interpretable Tabular Learning - Tabular-specific attention mechanisms
- Attention Is All You Need - Original Transformer architecture
- KerasFactory Layer Explorer - Browse all available layers
- Mixed-Type Data Tutorial - Complete guide to mixed-type tabular modeling