Skip to content

🎯 GatedFeatureSelection

🎯 GatedFeatureSelection

🟑 Intermediate βœ… Stable πŸ”₯ Popular

🎯 Overview

The GatedFeatureSelection layer implements a learnable feature selection mechanism using a sophisticated gating network. Each feature is assigned a dynamic importance weight between 0 and 1 through a multi-layer gating network that includes batch normalization and ReLU activations for stable training.

This layer is particularly powerful for dynamic feature importance learning, feature selection in time-series data, and implementing attention-like mechanisms for tabular data. It includes a small residual connection to maintain gradient flow and prevent information loss.

πŸ” How It Works

The GatedFeatureSelection layer processes features through a sophisticated gating mechanism:

  1. Feature Analysis: Analyzes all input features to understand their importance
  2. Gating Network: Uses a multi-layer network to compute feature weights
  3. Weight Generation: Produces sigmoid-activated weights between 0 and 1
  4. Residual Connection: Adds a small residual connection for gradient flow
  5. Weighted Output: Applies learned weights to scale feature importance
graph TD
    A[Input Features: batch_size, input_dim] --> B[Gating Network]
    B --> C[Hidden Layer 1 + ReLU + BatchNorm]
    C --> D[Hidden Layer 2 + ReLU + BatchNorm]
    D --> E[Output Layer + Sigmoid]
    E --> F[Feature Weights: 0-1]

    A --> G[Element-wise Multiplication]
    F --> G
    A --> H[Residual Connection Γ— 0.1]
    G --> I[Weighted Features]
    H --> I
    I --> J[Final Output]

    style A fill:#e6f3ff,stroke:#4a86e8
    style J fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style F fill:#f3e5f5,stroke:#9c27b0

πŸ’‘ Why Use This Layer?

Challenge Traditional Approach GatedFeatureSelection's Solution
Feature Importance Manual feature selection or uniform treatment 🎯 Automatic learning of feature importance through gating
Dynamic Selection Static feature selection decisions ⚑ Context-aware selection that adapts to input
Gradient Flow Potential vanishing gradients in selection πŸ”— Residual connection maintains gradient flow
Noise Reduction All features treated equally 🧠 Intelligent filtering of less important features

πŸ“Š Use Cases

  • Time Series Analysis: Dynamic feature selection for different time periods
  • Noise Reduction: Filtering out irrelevant or noisy features
  • Feature Engineering: Learning which features are most important
  • Attention Mechanisms: Implementing attention-like behavior for tabular data
  • High-Dimensional Data: Intelligently reducing feature space

πŸš€ Quick Start

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
from kerasfactory.layers import GatedFeatureSelection

# Create sample input data
batch_size, input_dim = 32, 20
x = keras.random.normal((batch_size, input_dim))

# Apply gated feature selection
gated_selection = GatedFeatureSelection(input_dim=input_dim, reduction_ratio=4)
selected_features = gated_selection(x)

print(f"Input shape: {x.shape}")           # (32, 20)
print(f"Output shape: {selected_features.shape}")  # (32, 20)

In a Sequential Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import keras
from kerasfactory.layers import GatedFeatureSelection

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu'),
    GatedFeatureSelection(input_dim=64, reduction_ratio=8),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import keras
from kerasfactory.layers import GatedFeatureSelection

# Define inputs
inputs = keras.Input(shape=(20,))  # 20 features

# Process features
x = keras.layers.Dense(64, activation='relu')(inputs)
x = GatedFeatureSelection(input_dim=64, reduction_ratio=4)(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Advanced configuration with custom reduction ratio
gated_selection = GatedFeatureSelection(
    input_dim=128,
    reduction_ratio=16,  # More aggressive reduction
    name="custom_gated_selection"
)

# Use in a complex model
inputs = keras.Input(shape=(50,))
x = keras.layers.Dense(128, activation='relu')(inputs)
x = keras.layers.BatchNormalization()(x)
x = gated_selection(x)  # Apply gated selection
x = keras.layers.Dense(64, activation='relu')(x)
x = keras.layers.Dropout(0.3)(x)
outputs = keras.layers.Dense(5, activation='softmax')(x)

model = keras.Model(inputs, outputs)

πŸ“– API Reference

kerasfactory.layers.GatedFeatureSelection

1
2
3
4
5
GatedFeatureSelection(
    input_dim: int,
    reduction_ratio: int = 4,
    **kwargs: dict[str, Any]
)

Gated feature selection layer with residual connection.

This layer implements a learnable feature selection mechanism using a gating network. Each feature is assigned a dynamic importance weight between 0 and 1 through a multi-layer gating network. The gating network includes batch normalization and ReLU activations for stable training. A small residual connection (0.1) is added to maintain gradient flow.

The layer is particularly useful for: 1. Dynamic feature importance learning 2. Feature selection in time-series data 3. Attention-like mechanisms for tabular data 4. Reducing noise in input features

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np
from keras import layers, Model
from kerasfactory.layers import GatedFeatureSelection

# Create sample input data
input_dim = 20
x = np.random.normal(size=(100, input_dim))

# Build model with gated feature selection
inputs = layers.Input(shape=(input_dim,))
x = GatedFeatureSelection(input_dim=input_dim, reduction_ratio=4)(inputs)
outputs = layers.Dense(1)(x)
model = Model(inputs=inputs, outputs=outputs)

# The layer will learn which features are most important
# and dynamically adjust their contribution to the output

Parameters:

Name Type Description Default
input_dim int

Dimension of the input features

required
reduction_ratio int

Ratio to reduce the hidden dimension of the gating network. A higher ratio means fewer parameters but potentially less expressive gates. Default is 4, meaning the hidden dimension will be input_dim // 4.

4

Initialize the gated feature selection layer.

Parameters:

Name Type Description Default
input_dim int

Dimension of the input features. Must match the last dimension of the input tensor.

required
reduction_ratio int

Ratio to reduce the hidden dimension of the gating network. The hidden dimension will be max(input_dim // reduction_ratio, 1). Default is 4.

4
**kwargs dict[str, Any]

Additional layer arguments passed to the parent Layer class.

{}
Source code in kerasfactory/layers/GatedFeaturesSelection.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def __init__(
    self,
    input_dim: int,
    reduction_ratio: int = 4,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize the gated feature selection layer.

    Args:
        input_dim: Dimension of the input features. Must match the last dimension
            of the input tensor.
        reduction_ratio: Ratio to reduce the hidden dimension of the gating network.
            The hidden dimension will be max(input_dim // reduction_ratio, 1).
            Default is 4.
        **kwargs: Additional layer arguments passed to the parent Layer class.
    """
    super().__init__(**kwargs)
    self.input_dim = input_dim
    self.reduction_ratio = reduction_ratio
    self.gate_net: Sequential | None = None

Functions

from_config classmethod
1
2
3
from_config(
    config: dict[str, Any]
) -> GatedFeatureSelection

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
GatedFeatureSelection

GatedFeatureSelection instance

Source code in kerasfactory/layers/GatedFeaturesSelection.py
146
147
148
149
150
151
152
153
154
155
156
@classmethod
def from_config(cls, config: dict[str, Any]) -> "GatedFeatureSelection":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        GatedFeatureSelection instance
    """
    return cls(**config)

πŸ”§ Parameters Deep Dive

input_dim (int)

  • Purpose: Dimension of the input features
  • Range: 1 to 1000+ (typically 10-100)
  • Impact: Must match the last dimension of your input tensor
  • Recommendation: Set to the output dimension of your previous layer

reduction_ratio (int)

  • Purpose: Ratio to reduce the hidden dimension of the gating network
  • Range: 2 to 32+ (typically 4-16)
  • Impact: Higher ratio = fewer parameters but potentially less expressive gates
  • Recommendation: Start with 4, increase for more aggressive feature selection

πŸ“ˆ Performance Characteristics

  • Speed: ⚑⚑⚑ Fast - simple neural network computation
  • Memory: πŸ’ΎπŸ’Ύ Low memory usage - minimal additional parameters
  • Accuracy: 🎯🎯🎯 Good for feature importance and noise reduction
  • Best For: Tabular data where feature importance varies by context

🎨 Examples

Example 1: Time Series Feature Selection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import keras
import numpy as np
from kerasfactory.layers import GatedFeatureSelection

# Simulate time series data with varying feature importance
batch_size, time_steps, features = 32, 24, 15
time_series_data = keras.random.normal((batch_size, time_steps, features))

# Build time series model with gated selection
inputs = keras.Input(shape=(time_steps, features))

# Process each time step
x = keras.layers.Dense(32, activation='relu')(inputs)
x = keras.layers.Reshape((-1, 32))(x)  # Flatten time dimension

# Apply gated feature selection
x = GatedFeatureSelection(input_dim=32, reduction_ratio=8)(x)

# Reshape back and process
x = keras.layers.Reshape((time_steps, 32))(x)
x = keras.layers.LSTM(64, return_sequences=True)(x)
x = keras.layers.LSTM(32)(x)
output = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, output)
model.compile(optimizer='adam', loss='binary_crossentropy')

Example 2: Multi-Task Feature Selection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Different tasks may need different feature selections
def create_multi_task_model():
    inputs = keras.Input(shape=(25,))  # 25 features

    # Shared feature processing with gated selection
    x = keras.layers.Dense(64, activation='relu')(inputs)
    x = keras.layers.BatchNormalization()(x)
    x = GatedFeatureSelection(input_dim=64, reduction_ratio=4)(x)

    # Task-specific processing
    # Classification task
    cls_features = keras.layers.Dense(32, activation='relu')(x)
    cls_features = keras.layers.Dropout(0.3)(cls_features)
    classification = keras.layers.Dense(3, activation='softmax', name='classification')(cls_features)

    # Regression task
    reg_features = keras.layers.Dense(16, activation='relu')(x)
    reg_features = keras.layers.Dropout(0.2)(reg_features)
    regression = keras.layers.Dense(1, name='regression')(reg_features)

    return keras.Model(inputs, [classification, regression])

model = create_multi_task_model()
model.compile(
    optimizer='adam',
    loss={'classification': 'categorical_crossentropy', 'regression': 'mse'},
    loss_weights={'classification': 1.0, 'regression': 0.5}
)

Example 3: Feature Importance Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Analyze which features are being selected
def analyze_feature_selection(model, test_data, feature_names=None):
    """Analyze feature selection patterns."""
    # Get the gated feature selection layer
    gated_layer = None
    for layer in model.layers:
        if isinstance(layer, GatedFeatureSelection):
            gated_layer = layer
            break

    if gated_layer is None:
        print("No GatedFeatureSelection layer found")
        return

    # Get feature weights
    weights = gated_layer.gate_net(test_data)

    # Analyze weights
    avg_weights = np.mean(weights, axis=0)
    print("Average feature weights:")
    for i, weight in enumerate(avg_weights):
        feature_name = feature_names[i] if feature_names else f"Feature_{i}"
        print(f"{feature_name}: {weight:.4f}")

    # Find most important features
    top_features = np.argsort(avg_weights)[-5:]  # Top 5 features
    print(f"\nTop 5 most important features: {top_features}")

    return weights

# Use with your model
# weights = analyze_feature_selection(model, test_data, feature_names)

πŸ’‘ Tips & Best Practices

  • Reduction Ratio: Start with 4, adjust based on feature complexity and model size
  • Residual Connection: The 0.1 residual connection helps maintain gradient flow
  • Batch Normalization: The gating network includes batch norm for stable training
  • Feature Preprocessing: Ensure features are properly normalized before selection
  • Monitoring: Track feature weights to understand selection patterns
  • Regularization: Combine with dropout to prevent overfitting

⚠️ Common Pitfalls

  • Input Dimension: Must match the last dimension of your input tensor
  • Reduction Ratio: Too high can lead to underfitting, too low to overfitting
  • Gradient Flow: The residual connection helps but monitor for vanishing gradients
  • Feature Interpretation: Weights are relative, not absolute importance
  • Memory Usage: Scales with input_dim, be careful with very large feature spaces

πŸ“š Further Reading