π§ TabularAttention
π§ TabularAttention
π― Overview
The TabularAttention layer implements a sophisticated dual attention mechanism specifically designed for tabular data. Unlike traditional attention mechanisms that focus on sequential data, this layer captures both inter-feature relationships (how features interact within each sample) and inter-sample relationships (how samples relate to each other across features).
This layer is particularly powerful for tabular datasets where understanding feature interactions and sample similarities is crucial for making accurate predictions. It's especially useful in scenarios where you have complex feature dependencies that traditional neural networks struggle to capture.
π How It Works
The TabularAttention layer processes tabular data through a two-stage attention mechanism:
- Inter-Feature Attention: Analyzes relationships between different features within each sample
- Inter-Sample Attention: Examines relationships between different samples across features
graph TD
A[Input: batch_size, num_samples, num_features] --> B[Input Projection to d_model]
B --> C[Inter-Feature Attention]
C --> D[Feature LayerNorm + Residual]
D --> E[Feed-Forward Network]
E --> F[Feature LayerNorm + Residual]
F --> G[Inter-Sample Attention]
G --> H[Sample LayerNorm + Residual]
H --> I[Output Projection]
I --> J[Output: batch_size, num_samples, d_model]
style A fill:#e6f3ff,stroke:#4a86e8
style J fill:#e8f5e9,stroke:#66bb6a
style C fill:#fff9e6,stroke:#ffb74d
style G fill:#fff9e6,stroke:#ffb74d
π‘ Why Use This Layer?
| Challenge | Traditional Approach | TabularAttention's Solution |
|---|---|---|
| Feature Interactions | Manual feature engineering or simple concatenation | π§ Automatic discovery of complex feature relationships through attention |
| Sample Relationships | Treating samples independently | π Cross-sample learning to identify similar patterns and outliers |
| High-Dimensional Data | Dimensionality reduction or feature selection | β‘ Efficient attention that scales to high-dimensional tabular data |
| Interpretability | Black-box models with limited insights | ποΈ Attention weights provide insights into feature and sample importance |
π Use Cases
- Financial Risk Assessment: Understanding how different financial indicators interact and identifying similar risk profiles
- Medical Diagnosis: Capturing complex relationships between symptoms and patient characteristics
- Recommendation Systems: Learning user-item interactions and finding similar users/items
- Anomaly Detection: Identifying unusual patterns by comparing samples across features
- Feature Engineering: Automatically discovering meaningful feature combinations
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
π API Reference
kerasfactory.layers.TabularAttention
This module implements a TabularAttention layer that applies inter-feature and inter-sample attention mechanisms for tabular data. It's particularly useful for capturing complex relationships between features and samples in tabular datasets.
Classes
TabularAttention
1 2 3 4 5 6 7 | |
Custom layer to apply inter-feature and inter-sample attention for tabular data.
This layer implements a dual attention mechanism: 1. Inter-feature attention: Captures dependencies between features for each sample 2. Inter-sample attention: Captures dependencies between samples for each feature
The layer uses MultiHeadAttention for both attention mechanisms and includes layer normalization, dropout, and a feed-forward network.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_heads |
int
|
Number of attention heads |
required |
d_model |
int
|
Dimensionality of the attention model |
required |
dropout_rate |
float
|
Dropout rate for regularization |
0.1
|
name |
str
|
Name for the layer |
None
|
Input shape
Tensor with shape: (batch_size, num_samples, num_features)
Output shape
Tensor with shape: (batch_size, num_samples, d_model)
Example
1 2 3 4 5 6 7 8 9 10 | |
Initialize the TabularAttention layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_heads |
int
|
Number of attention heads. |
required |
d_model |
int
|
Model dimension. |
required |
dropout_rate |
float
|
Dropout rate. |
0.1
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/TabularAttention.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
Functions
1 2 3 | |
Compute the output shape of the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_shape |
tuple[int, ...]
|
Shape of the input tensor. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, ...]
|
Shape of the output tensor. |
Source code in kerasfactory/layers/TabularAttention.py
223 224 225 226 227 228 229 230 231 232 | |
π§ Parameters Deep Dive
num_heads (int)
- Purpose: Number of attention heads for parallel processing
- Range: 1 to 64+ (typically 4, 8, or 16)
- Impact: More heads = better pattern recognition but higher computational cost
- Recommendation: Start with 8, increase if you have complex feature interactions
d_model (int)
- Purpose: Dimensionality of the attention model
- Range: 32 to 512+ (must be divisible by num_heads)
- Impact: Higher values = richer representations but more parameters
- Recommendation: Start with 64-128, scale based on your data complexity
dropout_rate (float)
- Purpose: Regularization to prevent overfitting
- Range: 0.0 to 0.9
- Impact: Higher values = more regularization but potentially less learning
- Recommendation: Start with 0.1, increase if overfitting occurs
π Performance Characteristics
- Speed: β‘β‘β‘ Fast for small to medium datasets, scales well with parallel processing
- Memory: πΎπΎπΎ Moderate memory usage due to attention computations
- Accuracy: π―π―π―π― Excellent for complex tabular data with feature interactions
- Best For: Tabular data with complex feature relationships and sample similarities
π¨ Examples
Example 1: Customer Segmentation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Example 2: Time Series Forecasting
1 2 3 4 5 6 7 8 9 10 11 12 | |
Example 3: Multi-Task Learning
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
π‘ Tips & Best Practices
- Start Simple: Begin with 4-8 attention heads and d_model=64, then scale up
- Data Preprocessing: Ensure your tabular data is properly normalized before applying attention
- Batch Size: Use larger batch sizes (32+) for better attention learning
- Layer Order: Place TabularAttention after initial feature processing but before final predictions
- Regularization: Use dropout and layer normalization to prevent overfitting
- Monitoring: Watch attention weights to understand what the model is learning
β οΈ Common Pitfalls
- Memory Issues: Large d_model values can cause memory problems - start smaller
- Overfitting: Too many heads or too high d_model can lead to overfitting on small datasets
- Input Shape: Ensure input is 3D: (batch_size, num_samples, num_features)
- Divisibility: d_model must be divisible by num_heads
- Gradient Issues: Use gradient clipping if training becomes unstable
π Related Layers
- MultiResolutionTabularAttention - Multi-scale attention for different feature granularities
- ColumnAttention - Focused column-wise attention mechanism
- RowAttention - Specialized row-wise attention for sample relationships
- VariableSelection - Feature selection that works well with attention layers
π Further Reading
- Attention Is All You Need - Original Transformer paper
- TabNet: Attentive Interpretable Tabular Learning - Tabular-specific attention mechanisms
- KerasFactory Layer Explorer - Browse all available layers
- Tabular Data Tutorial - Complete guide to tabular modeling