Skip to content

🎯 VariableSelection

🎯 VariableSelection

🟑 Intermediate βœ… Stable πŸ”₯ Popular

🎯 Overview

The VariableSelection layer implements dynamic feature selection using gated residual networks (GRNs). Unlike traditional feature selection methods that make static decisions, this layer learns to dynamically select and weight features based on the input context, making it particularly powerful for time series and tabular data where feature importance can vary.

This layer applies a gated residual network to each feature independently and learns feature weights through a softmax layer, optionally using a context vector to condition the feature selection process.

πŸ” How It Works

The VariableSelection layer processes features through a sophisticated selection mechanism:

  1. Feature Processing: Each feature is processed independently through a gated residual network
  2. Weight Learning: A selection network learns weights for each feature
  3. Context Integration: Optionally uses a context vector to condition the selection
  4. Softmax Weighting: Applies softmax to normalize feature weights
  5. Feature Aggregation: Combines features based on learned weights
graph TD
    A[Input Features: batch_size, nr_features, feature_dim] --> B[Feature GRNs]
    C[Context Vector: batch_size, context_dim] --> D[Context Processing]

    B --> E[Feature Representations]
    D --> F[Context Representation]

    E --> G[Selection Network]
    F --> G

    G --> H[Feature Weights]
    H --> I[Softmax Normalization]
    I --> J[Weighted Feature Selection]

    E --> K[Feature Aggregation]
    J --> K
    K --> L[Selected Features + Weights]

    style A fill:#e6f3ff,stroke:#4a86e8
    style C fill:#fff9e6,stroke:#ffb74d
    style L fill:#e8f5e9,stroke:#66bb6a
    style G fill:#f3e5f5,stroke:#9c27b0

πŸ’‘ Why Use This Layer?

Challenge Traditional Approach VariableSelection's Solution
Feature Selection Static selection or manual feature engineering 🎯 Dynamic selection that adapts to input context
Feature Importance Fixed importance or post-hoc analysis ⚑ Learned importance during training
Context Awareness Ignore contextual information 🧠 Context-conditioned selection using context vectors
Feature Interactions Treat features independently πŸ”— Gated processing that considers feature relationships

πŸ“Š Use Cases

  • Time Series Forecasting: Selecting relevant features for different time periods
  • Dynamic Feature Engineering: Adapting feature selection based on data patterns
  • Context-Aware Modeling: Using external context to guide feature selection
  • High-Dimensional Data: Intelligently reducing feature space
  • Multi-Task Learning: Different feature selections for different tasks

πŸš€ Quick Start

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
from kerasfactory.layers import VariableSelection

# Create sample input data
batch_size, nr_features, feature_dim = 32, 10, 16
x = keras.random.normal((batch_size, nr_features, feature_dim))

# Apply variable selection
vs = VariableSelection(nr_features=nr_features, units=32, dropout_rate=0.1)
selected_features, feature_weights = vs(x)

print(f"Selected features shape: {selected_features.shape}")  # (32, 16)
print(f"Feature weights shape: {feature_weights.shape}")      # (32, 10)

With Context Vector

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Create data with context
features = keras.random.normal((32, 10, 16))
context = keras.random.normal((32, 64))  # 64-dimensional context

# Apply variable selection with context
vs_context = VariableSelection(
    nr_features=10, 
    units=32, 
    dropout_rate=0.1, 
    use_context=True
)
selected, weights = vs_context([features, context])

print(f"Selected features shape: {selected.shape}")  # (32, 16)
print(f"Feature weights shape: {weights.shape}")     # (32, 10)

In a Sequential Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
from kerasfactory.layers import VariableSelection

# Create a model with variable selection
model = keras.Sequential([
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Reshape((1, 32)),  # Reshape for variable selection
    VariableSelection(nr_features=1, units=16, dropout_rate=0.1),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy')

In a Functional Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import keras
from kerasfactory.layers import VariableSelection

# Define inputs
features_input = keras.Input(shape=(10, 16), name='features')
context_input = keras.Input(shape=(64,), name='context')

# Apply variable selection with context
selected_features, weights = VariableSelection(
    nr_features=10, 
    units=32, 
    dropout_rate=0.1, 
    use_context=True
)([features_input, context_input])

# Continue processing
x = keras.layers.Dense(64, activation='relu')(selected_features)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model([features_input, context_input], outputs)

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Advanced configuration with custom parameters
vs = VariableSelection(
    nr_features=20,
    units=64,           # Larger hidden units for complex selection
    dropout_rate=0.2,   # Higher dropout for regularization
    use_context=True,   # Enable context conditioning
    name="advanced_variable_selection"
)

# Use in a complex model
features = keras.Input(shape=(20, 32), name='features')
context = keras.Input(shape=(128,), name='context')

selected, weights = vs([features, context])

# Multi-task processing
task1 = keras.layers.Dense(32, activation='relu')(selected)
task1 = keras.layers.Dense(5, activation='softmax', name='classification')(task1)

task2 = keras.layers.Dense(16, activation='relu')(selected)
task2 = keras.layers.Dense(1, name='regression')(task2)

model = keras.Model([features, context], [task1, task2])

πŸ“– API Reference

kerasfactory.layers.VariableSelection

This module implements a VariableSelection layer that applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It's particularly useful for dynamic feature selection in time series and tabular models.

Classes

VariableSelection
1
2
3
4
5
6
7
8
VariableSelection(
    nr_features: int,
    units: int,
    dropout_rate: float = 0.1,
    use_context: bool = False,
    name: str | None = None,
    **kwargs: Any
)

Layer for dynamic feature selection using gated residual networks.

This layer applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It can optionally use a context vector to condition the feature selection.

Parameters:

Name Type Description Default
nr_features int

Number of input features

required
units int

Number of hidden units in the gated residual network

required
dropout_rate float

Dropout rate for regularization

0.1
use_context bool

Whether to use a context vector for conditioning

False
name str

Name for the layer

None
Input shape

If use_context is False: - Single tensor with shape: (batch_size, nr_features, feature_dim) If use_context is True: - List of two tensors: - Features tensor with shape: (batch_size, nr_features, feature_dim) - Context tensor with shape: (batch_size, context_dim)

Output shape

Tuple of two tensors: - Selected features: (batch_size, feature_dim) - Feature weights: (batch_size, nr_features)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import keras
from kerasfactory.layers import VariableSelection

# Create sample input data
x = keras.random.normal((32, 10, 16))  # 32 batches, 10 features, 16 dims per feature

# Without context
vs = VariableSelection(nr_features=10, units=32, dropout_rate=0.1)
selected, weights = vs(x)
print("Selected features shape:", selected.shape)  # (32, 16)
print("Feature weights shape:", weights.shape)  # (32, 10)

# With context
context = keras.random.normal((32, 64))  # 32 batches, 64-dim context
vs_context = VariableSelection(nr_features=10, units=32, dropout_rate=0.1, use_context=True)
selected, weights = vs_context([x, context])

Initialize the VariableSelection layer.

Parameters:

Name Type Description Default
nr_features int

Number of input features.

required
units int

Number of units in the selection network.

required
dropout_rate float

Dropout rate.

0.1
use_context bool

Whether to use context for selection.

False
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/VariableSelection.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def __init__(
    self,
    nr_features: int,
    units: int,
    dropout_rate: float = 0.1,
    use_context: bool = False,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the VariableSelection layer.

    Args:
        nr_features: Number of input features.
        units: Number of units in the selection network.
        dropout_rate: Dropout rate.
        use_context: Whether to use context for selection.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._nr_features = nr_features
    self._units = units
    self._dropout_rate = dropout_rate
    self._use_context = use_context

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.nr_features = self._nr_features
    self.units = self._units
    self.dropout_rate = self._dropout_rate
    self.use_context = self._use_context

    # Initialize layers
    self.feature_grns: list[GatedResidualNetwork] | None = None
    self.grn_var: GatedResidualNetwork | None = None
    self.softmax: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...] | list[tuple[int, ...]]
) -> list[tuple[int, ...]]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...] | list[tuple[int, ...]]

Shape of the input tensor or list of shapes if using context.

required

Returns:

Type Description
list[tuple[int, ...]]

List of shapes for the output tensors.

Source code in kerasfactory/layers/VariableSelection.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
def compute_output_shape(
    self,
    input_shape: tuple[int, ...] | list[tuple[int, ...]],
) -> list[tuple[int, ...]]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor or list of shapes if using context.

    Returns:
        List of shapes for the output tensors.
    """
    features_shape = input_shape[0] if self.use_context else input_shape

    # Handle different input shape types
    if isinstance(features_shape, list | tuple) and len(features_shape) > 0:
        batch_size = (
            int(features_shape[0])
            if isinstance(features_shape[0], int | float)
            else 1
        )
    else:
        batch_size = 1  # Default fallback

    return [
        (batch_size, self.units),  # Selected features
        (batch_size, self.nr_features),  # Feature weights
    ]

πŸ”§ Parameters Deep Dive

nr_features (int)

  • Purpose: Number of input features to select from
  • Range: 1 to 1000+ (typically 5-50)
  • Impact: Must match the number of features in your input
  • Recommendation: Set to the actual number of features you want to select from

units (int)

  • Purpose: Number of hidden units in the selection network
  • Range: 8 to 512+ (typically 16-128)
  • Impact: Larger values = more complex selection patterns but more parameters
  • Recommendation: Start with 32, scale based on feature complexity

dropout_rate (float)

  • Purpose: Regularization to prevent overfitting
  • Range: 0.0 to 0.9
  • Impact: Higher values = more regularization but potentially less learning
  • Recommendation: Start with 0.1, increase if overfitting occurs

use_context (bool)

  • Purpose: Whether to use a context vector for conditioning
  • Default: False
  • Impact: Enables context-aware feature selection
  • Recommendation: Use True when you have contextual information

πŸ“ˆ Performance Characteristics

  • Speed: ⚑⚑⚑ Fast for small to medium feature counts, scales with nr_features
  • Memory: πŸ’ΎπŸ’ΎπŸ’Ύ Moderate memory usage due to per-feature processing
  • Accuracy: 🎯🎯🎯🎯 Excellent for dynamic feature selection tasks
  • Best For: Time series and tabular data with varying feature importance

🎨 Examples

Example 1: Time Series Feature Selection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import keras
import numpy as np
from kerasfactory.layers import VariableSelection

# Simulate time series data with multiple features
batch_size, time_steps, features = 32, 24, 8  # 24 hours, 8 features per hour
time_series_data = keras.random.normal((batch_size, time_steps, features))

# Context: time of day, day of week, etc.
context_data = keras.random.normal((batch_size, 16))  # 16-dim context

# Build time series model with variable selection
features_input = keras.Input(shape=(time_steps, features), name='time_series')
context_input = keras.Input(shape=(16,), name='context')

# Apply variable selection to each time step
selected_features, weights = VariableSelection(
    nr_features=time_steps,
    units=32,
    dropout_rate=0.1,
    use_context=True
)([time_series_data, context_data])

# Process selected features
x = keras.layers.Dense(64, activation='relu')(selected_features)
x = keras.layers.Dropout(0.2)(x)
forecast = keras.layers.Dense(1)(x)  # Predict next value

model = keras.Model([features_input, context_input], forecast)
model.compile(optimizer='adam', loss='mse')

# Analyze feature weights over time
print("Feature weights shape:", weights.shape)  # (32, 24)
print("Average weights per time step:", np.mean(weights, axis=0))

Example 2: Multi-Task Feature Selection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Different tasks may need different feature selections
def create_multi_task_model():
    features = keras.Input(shape=(15, 20), name='features')  # 15 features, 20 dims each
    context = keras.Input(shape=(32,), name='context')

    # Shared variable selection
    selected, weights = VariableSelection(
        nr_features=15,
        units=48,
        dropout_rate=0.15,
        use_context=True
    )([features, context])

    # Task-specific processing
    # Classification task
    cls_features = keras.layers.Dense(64, activation='relu')(selected)
    cls_features = keras.layers.Dropout(0.3)(cls_features)
    classification = keras.layers.Dense(3, activation='softmax', name='classification')(cls_features)

    # Regression task
    reg_features = keras.layers.Dense(32, activation='relu')(selected)
    reg_features = keras.layers.Dropout(0.2)(reg_features)
    regression = keras.layers.Dense(1, name='regression')(reg_features)

    return keras.Model([features, context], [classification, regression])

model = create_multi_task_model()
model.compile(
    optimizer='adam',
    loss={'classification': 'categorical_crossentropy', 'regression': 'mse'},
    loss_weights={'classification': 1.0, 'regression': 0.5}
)

Example 3: Feature Importance Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Analyze which features are being selected
def analyze_feature_selection(model, test_data, feature_names=None):
    """Analyze feature selection patterns."""
    # Get the variable selection layer
    vs_layer = None
    for layer in model.layers:
        if isinstance(layer, VariableSelection):
            vs_layer = layer
            break

    if vs_layer is None:
        print("No VariableSelection layer found")
        return

    # Get feature weights
    features, context = test_data
    _, weights = vs_layer([features, context])

    # Analyze weights
    avg_weights = np.mean(weights, axis=0)
    print("Average feature weights:")
    for i, weight in enumerate(avg_weights):
        feature_name = feature_names[i] if feature_names else f"Feature_{i}"
        print(f"{feature_name}: {weight:.4f}")

    # Find most important features
    top_features = np.argsort(avg_weights)[-5:]  # Top 5 features
    print(f"\nTop 5 most important features: {top_features}")

    return weights

# Use with your model
# weights = analyze_feature_selection(model, [test_features, test_context], feature_names)

πŸ’‘ Tips & Best Practices

  • Feature Dimension: Ensure feature_dim is consistent across all features
  • Context Usage: Use context vectors when you have relevant contextual information
  • Units Sizing: Start with units = nr_features * 2, adjust based on complexity
  • Regularization: Use appropriate dropout to prevent overfitting
  • Weight Analysis: Monitor feature weights to understand selection patterns
  • Batch Size: Works best with larger batch sizes for stable weight learning

⚠️ Common Pitfalls

  • Input Shape: Must be 3D tensor (batch_size, nr_features, feature_dim)
  • Context Mismatch: Context vector must be 2D (batch_size, context_dim)
  • Feature Count: nr_features must match actual number of input features
  • Memory Usage: Scales with nr_features - be careful with large feature counts
  • Weight Interpretation: Weights are relative, not absolute importance

πŸ“š Further Reading