π GatedFeatureFusion
π GatedFeatureFusion
π― Overview
The GatedFeatureFusion layer intelligently combines two feature representations using a learned gating mechanism. This is particularly useful when you have multiple representations of the same data (e.g., raw numerical features alongside their embeddings) and want to learn the optimal way to combine them.
The layer uses a learned gate to balance the contributions of both representations, allowing the model to dynamically decide how much to rely on each representation based on the input context.
π How It Works
The GatedFeatureFusion layer processes two feature representations through a sophisticated fusion mechanism:
- Input Concatenation: Combines both feature representations
- Gate Learning: Uses a dense layer to learn fusion weights
- Sigmoid Activation: Applies sigmoid to create gating values between 0 and 1
- Weighted Fusion: Combines representations using learned gates
- Output Generation: Produces the fused feature representation
graph TD
A[Feature Representation 1] --> C[Concatenation]
B[Feature Representation 2] --> C
C --> D[Fusion Gate Network]
D --> E[Sigmoid Activation]
E --> F[Gate Weights]
A --> G[Element-wise Multiplication]
F --> G
B --> H[Element-wise Multiplication]
F --> H
G --> I[Weighted Rep 1]
H --> J[Weighted Rep 2]
I --> K[Addition]
J --> K
K --> L[Fused Features]
style A fill:#e6f3ff,stroke:#4a86e8
style B fill:#fff9e6,stroke:#ffb74d
style L fill:#e8f5e9,stroke:#66bb6a
style D fill:#f3e5f5,stroke:#9c27b0
π‘ Why Use This Layer?
| Challenge | Traditional Approach | GatedFeatureFusion's Solution |
|---|---|---|
| Multiple Representations | Simple concatenation or averaging | π― Learned fusion that adapts to input context |
| Feature Redundancy | Treat all features equally | β‘ Intelligent weighting to balance contributions |
| Representation Quality | No adaptation to representation quality | π§ Dynamic gating based on representation relevance |
| Information Loss | Fixed combination strategies | π Preserves information from both representations |
π Use Cases
- Multi-Modal Data: Combining different data types (numerical + categorical embeddings)
- Feature Engineering: Fusing raw features with engineered features
- Ensemble Methods: Combining different model representations
- Transfer Learning: Fusing pre-trained features with task-specific features
- Data Augmentation: Combining original and augmented feature representations
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
π API Reference
kerasfactory.layers.GatedFeatureFusion
This module implements a GatedFeatureFusion layer that combines two feature representations through a learned gating mechanism. It's particularly useful for tabular datasets with multiple representations (e.g., raw numeric features alongside embeddings).
Classes
GatedFeatureFusion
1 2 3 4 5 | |
Gated feature fusion layer for combining two feature representations.
This layer takes two inputs (e.g., numerical features and their embeddings) and fuses them using a learned gate to balance their contributions. The gate is computed using a dense layer with sigmoid activation, applied to the concatenation of both inputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activation |
str
|
Activation function to use for the gate. Default is 'sigmoid'. |
'sigmoid'
|
name |
str | None
|
Optional name for the layer. |
None
|
Input shape
A list of 2 tensors with shape: [(batch_size, ..., features), (batch_size, ..., features)]
Both inputs must have the same shape.
Output shape
Tensor with shape: (batch_size, ..., features), same as each input.
Example
1 2 3 4 5 6 7 8 9 10 | |
Initialize the GatedFeatureFusion layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
activation |
str
|
Activation function for the gate. |
'sigmoid'
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/GatedFeatureFusion.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | |
π§ Parameters Deep Dive
activation (str)
- Purpose: Activation function for the fusion gate
- Options: 'sigmoid', 'tanh', 'relu', 'softmax', etc.
- Default: 'sigmoid'
- Impact: Controls the gating behavior and output range
- Recommendation: Use 'sigmoid' for balanced fusion, 'tanh' for signed gating
π Performance Characteristics
- Speed: β‘β‘β‘β‘ Very fast - simple dense layer computation
- Memory: πΎπΎ Low memory usage - minimal additional parameters
- Accuracy: π―π―π―π― Excellent for multi-representation fusion
- Best For: Tabular data with multiple feature representations
π¨ Examples
Example 1: Numerical + Categorical Embeddings
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Example 2: Multi-Scale Feature Fusion
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
Example 3: Ensemble Feature Fusion
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
π‘ Tips & Best Practices
- Representation Quality: Ensure both representations are meaningful and complementary
- Feature Alignment: Both inputs must have the same feature dimension
- Activation Choice: Use 'sigmoid' for balanced fusion, 'tanh' for signed gating
- Regularization: Combine with dropout to prevent overfitting
- Interpretability: Monitor gate values to understand fusion behavior
- Data Preprocessing: Ensure both representations are properly normalized
β οΈ Common Pitfalls
- Shape Mismatch: Both inputs must have identical shapes
- Input Format: Must provide exactly two inputs as a list
- Representation Quality: Poor representations will lead to poor fusion
- Overfitting: Can overfit on small datasets - use regularization
- Gate Interpretation: Gate values are relative, not absolute importance
π Related Layers
- GatedFeatureSelection - Gated feature selection mechanism
- VariableSelection - Dynamic feature selection
- AdvancedNumericalEmbedding - Advanced numerical embeddings
- TabularAttention - Attention-based feature processing
π Further Reading
- Gated Residual Networks - GRN architecture details
- Feature Fusion in Deep Learning - Feature fusion concepts
- Multi-Modal Learning - Multi-modal learning approaches
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering