π ColumnAttention
π ColumnAttention
π― Overview
The ColumnAttention layer implements a column-wise attention mechanism that dynamically weights features based on their importance and context. Unlike traditional attention mechanisms that focus on sequence relationships, this layer learns to assign attention weights to each feature (column) in tabular data, allowing the model to focus on the most relevant features for each prediction.
This layer is particularly useful for feature selection, interpretability, and improving model performance by learning which features are most important for each sample.
π How It Works
The ColumnAttention layer processes tabular data through a feature-wise attention mechanism:
- Feature Analysis: Analyzes all input features to understand their importance
- Attention Weight Generation: Uses a neural network to compute attention weights for each feature
- Dynamic Weighting: Applies learned weights to scale feature importance
- Weighted Output: Returns the input features scaled by their attention weights
graph TD
A[Input: batch_size, num_features] --> B[Feature Analysis]
B --> C[Attention Network]
C --> D[Softmax Activation]
D --> E[Attention Weights]
A --> F[Element-wise Multiplication]
E --> F
F --> G[Weighted Features Output]
style A fill:#e6f3ff,stroke:#4a86e8
style G fill:#e8f5e9,stroke:#66bb6a
style C fill:#fff9e6,stroke:#ffb74d
style E fill:#f3e5f5,stroke:#9c27b0
π‘ Why Use This Layer?
| Challenge | Traditional Approach | ColumnAttention's Solution |
|---|---|---|
| Feature Importance | Manual feature selection or uniform treatment | π― Automatic learning of feature importance per sample |
| Dynamic Weighting | Static feature weights or simple normalization | β‘ Context-aware feature weighting based on input |
| Interpretability | Black-box feature processing | ποΈ Transparent attention weights show feature importance |
| Noise Reduction | All features treated equally | π Automatic filtering of less important features |
π Use Cases
- Feature Selection: Automatically identifying and emphasizing important features
- Noise Reduction: Down-weighting irrelevant or noisy features
- Interpretability: Understanding which features drive predictions
- Data Quality: Handling datasets with varying feature importance
- Model Regularization: Preventing overfitting by focusing on important features
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
π API Reference
kerasfactory.layers.ColumnAttention
Column attention mechanism for weighting features dynamically.
Classes
ColumnAttention
1 2 3 4 5 | |
Column attention mechanism to weight features dynamically.
This layer applies attention weights to each feature (column) in the input tensor. The attention weights are computed using a two-layer neural network that takes the input features and outputs attention weights for each feature.
Example
1 2 3 4 5 6 7 8 9 10 11 | |
Initialize column attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dim |
int
|
Input dimension |
required |
hidden_dim |
int | None
|
Hidden layer dimension. If None, uses input_dim // 2 |
None
|
**kwargs |
dict[str, Any]
|
Additional layer arguments |
{}
|
Source code in kerasfactory/layers/ColumnAttention.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
Functions
classmethod
1 | |
Create layer from configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config |
dict[str, Any]
|
Layer configuration dictionary |
required |
Returns:
| Type | Description |
|---|---|
ColumnAttention
|
ColumnAttention instance |
Source code in kerasfactory/layers/ColumnAttention.py
111 112 113 114 115 116 117 118 119 120 121 | |
π§ Parameters Deep Dive
input_dim (int)
- Purpose: Number of input features to apply attention to
- Range: 1 to 1000+ (typically 10-100)
- Impact: Must match the number of features in your input
- Recommendation: Set to the output dimension of your previous layer
hidden_dim (int, optional)
- Purpose: Size of the hidden layer in the attention network
- Range: 1 to input_dim (default: input_dim // 2)
- Impact: Larger values = more complex attention patterns but more parameters
- Recommendation: Start with default, increase for complex feature interactions
π Performance Characteristics
- Speed: β‘β‘β‘β‘ Very fast - simple neural network computation
- Memory: πΎπΎ Low memory usage - minimal additional parameters
- Accuracy: π―π―π― Good for feature importance and noise reduction
- Best For: Tabular data where feature importance varies by sample
π¨ Examples
Example 1: Feature Importance Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
Example 2: Multi-Task Learning with Feature Attention
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Example 3: Noisy Data Handling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
π‘ Tips & Best Practices
- Placement: Use after initial feature processing but before final predictions
- Hidden Dimension: Start with input_dim // 2, adjust based on complexity
- Regularization: Combine with dropout and batch normalization for better generalization
- Interpretability: Access attention weights to understand feature importance
- Data Quality: Particularly effective with noisy or high-dimensional data
- Monitoring: Track attention weight distributions during training
β οΈ Common Pitfalls
- Input Shape: Must be 2D tensor (batch_size, num_features)
- Dimension Mismatch: input_dim must match the number of features
- Overfitting: Can overfit on small datasets - use regularization
- Memory: Hidden dimension affects memory usage - keep reasonable
- Interpretation: Attention weights are relative, not absolute importance
π Related Layers
- RowAttention - Row-wise attention for sample relationships
- TabularAttention - General tabular attention mechanism
- VariableSelection - Feature selection layer
- SparseAttentionWeighting - Sparse attention weights
π Further Reading
- Attention Mechanisms in Deep Learning - Understanding attention mechanisms
- Feature Selection in Machine Learning - Feature selection concepts
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering