π― VariableSelection
π― VariableSelection
π― Overview
The VariableSelection layer implements dynamic feature selection using gated residual networks (GRNs). Unlike traditional feature selection methods that make static decisions, this layer learns to dynamically select and weight features based on the input context, making it particularly powerful for time series and tabular data where feature importance can vary.
This layer applies a gated residual network to each feature independently and learns feature weights through a softmax layer, optionally using a context vector to condition the feature selection process.
π How It Works
The VariableSelection layer processes features through a sophisticated selection mechanism:
- Feature Processing: Each feature is processed independently through a gated residual network
- Weight Learning: A selection network learns weights for each feature
- Context Integration: Optionally uses a context vector to condition the selection
- Softmax Weighting: Applies softmax to normalize feature weights
- Feature Aggregation: Combines features based on learned weights
graph TD
A[Input Features: batch_size, nr_features, feature_dim] --> B[Feature GRNs]
C[Context Vector: batch_size, context_dim] --> D[Context Processing]
B --> E[Feature Representations]
D --> F[Context Representation]
E --> G[Selection Network]
F --> G
G --> H[Feature Weights]
H --> I[Softmax Normalization]
I --> J[Weighted Feature Selection]
E --> K[Feature Aggregation]
J --> K
K --> L[Selected Features + Weights]
style A fill:#e6f3ff,stroke:#4a86e8
style C fill:#fff9e6,stroke:#ffb74d
style L fill:#e8f5e9,stroke:#66bb6a
style G fill:#f3e5f5,stroke:#9c27b0
π‘ Why Use This Layer?
| Challenge | Traditional Approach | VariableSelection's Solution |
|---|---|---|
| Feature Selection | Static selection or manual feature engineering | π― Dynamic selection that adapts to input context |
| Feature Importance | Fixed importance or post-hoc analysis | β‘ Learned importance during training |
| Context Awareness | Ignore contextual information | π§ Context-conditioned selection using context vectors |
| Feature Interactions | Treat features independently | π Gated processing that considers feature relationships |
π Use Cases
- Time Series Forecasting: Selecting relevant features for different time periods
- Dynamic Feature Engineering: Adapting feature selection based on data patterns
- Context-Aware Modeling: Using external context to guide feature selection
- High-Dimensional Data: Intelligently reducing feature space
- Multi-Task Learning: Different feature selections for different tasks
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
With Context Vector
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
π API Reference
kerasfactory.layers.VariableSelection
This module implements a VariableSelection layer that applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It's particularly useful for dynamic feature selection in time series and tabular models.
Classes
VariableSelection
1 2 3 4 5 6 7 8 | |
Layer for dynamic feature selection using gated residual networks.
This layer applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It can optionally use a context vector to condition the feature selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nr_features |
int
|
Number of input features |
required |
units |
int
|
Number of hidden units in the gated residual network |
required |
dropout_rate |
float
|
Dropout rate for regularization |
0.1
|
use_context |
bool
|
Whether to use a context vector for conditioning |
False
|
name |
str
|
Name for the layer |
None
|
Input shape
If use_context is False:
- Single tensor with shape: (batch_size, nr_features, feature_dim)
If use_context is True:
- List of two tensors:
- Features tensor with shape: (batch_size, nr_features, feature_dim)
- Context tensor with shape: (batch_size, context_dim)
Output shape
Tuple of two tensors:
- Selected features: (batch_size, feature_dim)
- Feature weights: (batch_size, nr_features)
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Initialize the VariableSelection layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nr_features |
int
|
Number of input features. |
required |
units |
int
|
Number of units in the selection network. |
required |
dropout_rate |
float
|
Dropout rate. |
0.1
|
use_context |
bool
|
Whether to use context for selection. |
False
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/VariableSelection.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
Functions
1 2 3 | |
Compute the output shape of the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_shape |
tuple[int, ...] | list[tuple[int, ...]]
|
Shape of the input tensor or list of shapes if using context. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[int, ...]]
|
List of shapes for the output tensors. |
Source code in kerasfactory/layers/VariableSelection.py
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | |
π§ Parameters Deep Dive
nr_features (int)
- Purpose: Number of input features to select from
- Range: 1 to 1000+ (typically 5-50)
- Impact: Must match the number of features in your input
- Recommendation: Set to the actual number of features you want to select from
units (int)
- Purpose: Number of hidden units in the selection network
- Range: 8 to 512+ (typically 16-128)
- Impact: Larger values = more complex selection patterns but more parameters
- Recommendation: Start with 32, scale based on feature complexity
dropout_rate (float)
- Purpose: Regularization to prevent overfitting
- Range: 0.0 to 0.9
- Impact: Higher values = more regularization but potentially less learning
- Recommendation: Start with 0.1, increase if overfitting occurs
use_context (bool)
- Purpose: Whether to use a context vector for conditioning
- Default: False
- Impact: Enables context-aware feature selection
- Recommendation: Use True when you have contextual information
π Performance Characteristics
- Speed: β‘β‘β‘ Fast for small to medium feature counts, scales with nr_features
- Memory: πΎπΎπΎ Moderate memory usage due to per-feature processing
- Accuracy: π―π―π―π― Excellent for dynamic feature selection tasks
- Best For: Time series and tabular data with varying feature importance
π¨ Examples
Example 1: Time Series Feature Selection
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
Example 2: Multi-Task Feature Selection
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
Example 3: Feature Importance Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
π‘ Tips & Best Practices
- Feature Dimension: Ensure feature_dim is consistent across all features
- Context Usage: Use context vectors when you have relevant contextual information
- Units Sizing: Start with units = nr_features * 2, adjust based on complexity
- Regularization: Use appropriate dropout to prevent overfitting
- Weight Analysis: Monitor feature weights to understand selection patterns
- Batch Size: Works best with larger batch sizes for stable weight learning
β οΈ Common Pitfalls
- Input Shape: Must be 3D tensor (batch_size, nr_features, feature_dim)
- Context Mismatch: Context vector must be 2D (batch_size, context_dim)
- Feature Count: nr_features must match actual number of input features
- Memory Usage: Scales with nr_features - be careful with large feature counts
- Weight Interpretation: Weights are relative, not absolute importance
π Related Layers
- GatedFeatureSelection - Gated feature selection mechanism
- GatedResidualNetwork - Core GRN used in variable selection
- TabularAttention - Attention-based feature processing
- DistributionAwareEncoder - Distribution-aware feature encoding
π Further Reading
- Temporal Fusion Transformers - Original paper on variable selection
- Gated Residual Networks - GRN architecture details
- Feature Selection in Deep Learning - Feature selection concepts
- KerasFactory Layer Explorer - Browse all available layers
- Time Series Tutorial - Complete guide to time series modeling