Skip to content

โœ‚๏ธ FeatureCutout

โœ‚๏ธ FeatureCutout

๐ŸŸก Intermediate โœ… Stable ๐Ÿ”ฅ Popular

๐ŸŽฏ Overview

The FeatureCutout layer randomly masks out (sets to zero) a specified fraction of features during training to improve model robustness and prevent overfitting. During inference, all features are kept intact, making it a powerful regularization technique.

This layer is particularly effective for tabular data where feature interactions are complex and overfitting is a common concern. It forces the model to learn robust representations that don't rely on any single feature.

๐Ÿ” How It Works

The FeatureCutout layer applies random masking during training:

  1. Training Mode Check: Only applies masking during training
  2. Random Mask Generation: Creates random mask based on cutout probability
  3. Feature Masking: Sets selected features to specified noise value
  4. Inference Passthrough: Returns original features during inference
  5. Output Generation: Produces masked or original features based on mode
graph TD
    A[Input Features] --> B{Training Mode?}
    B -->|Yes| C[Generate Random Mask]
    B -->|No| H[Return Original Features]

    C --> D[Apply Cutout Probability]
    D --> E[Create Binary Mask]
    E --> F[Apply Mask to Features]
    F --> G[Return Masked Features]

    style A fill:#e6f3ff,stroke:#4a86e8
    style G fill:#e8f5e9,stroke:#66bb6a
    style H fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style C fill:#f3e5f5,stroke:#9c27b0

๐Ÿ’ก Why Use This Layer?

Challenge Traditional Approach FeatureCutout's Solution
Overfitting Dropout on hidden layers only ๐ŸŽฏ Feature-level regularization prevents overfitting on input features
Feature Dependencies Model may rely on specific features โšก Forces robustness by randomly removing features
Generalization Poor performance on unseen data ๐Ÿง  Improves generalization through feature masking
Data Augmentation Limited augmentation for tabular data ๐Ÿ”— Tabular data augmentation through feature masking

๐Ÿ“Š Use Cases

  • Overfitting Prevention: Regularizing models prone to overfitting
  • Feature Robustness: Ensuring models don't rely on specific features
  • Data Augmentation: Augmenting tabular datasets during training
  • Generalization: Improving model performance on unseen data
  • Feature Importance: Understanding which features are truly important

๐Ÿš€ Quick Start

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import keras
from kerasfactory.layers import FeatureCutout

# Create sample input data
batch_size, feature_dim = 32, 10
x = keras.random.normal((batch_size, feature_dim))

# Apply feature cutout
cutout = FeatureCutout(cutout_prob=0.2)
masked = cutout(x, training=True)

print(f"Input shape: {x.shape}")           # (32, 10)
print(f"Output shape: {masked.shape}")     # (32, 10)
print(f"Training mode: {cutout.training}") # True

In a Sequential Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import FeatureCutout

model = keras.Sequential([
    FeatureCutout(cutout_prob=0.15),  # Apply feature cutout first
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import keras
from kerasfactory.layers import FeatureCutout

# Define inputs
inputs = keras.Input(shape=(20,))  # 20 features

# Apply feature cutout
x = FeatureCutout(cutout_prob=0.1)(inputs)

# Continue processing
x = keras.layers.Dense(64, activation='relu')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Advanced configuration with custom parameters
cutout = FeatureCutout(
    cutout_prob=0.25,        # Higher cutout probability
    noise_value=-1.0,        # Custom noise value instead of zero
    seed=42,                 # Fixed seed for reproducibility
    name="custom_feature_cutout"
)

# Use in a complex model
inputs = keras.Input(shape=(50,))

# Apply feature cutout
x = cutout(inputs)

# Multi-branch processing
branch1 = keras.layers.Dense(32, activation='relu')(x)
branch2 = keras.layers.Dense(32, activation='tanh')(x)

# Combine branches
x = keras.layers.Concatenate()([branch1, branch2])
x = keras.layers.Dense(64, activation='relu')(x)
x = keras.layers.Dropout(0.3)(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

๐Ÿ“– API Reference

kerasfactory.layers.FeatureCutout

Feature cutout regularization layer for neural networks.

Classes

FeatureCutout
1
2
3
4
5
6
FeatureCutout(
    cutout_prob: float = 0.1,
    noise_value: float = 0.0,
    seed: int | None = None,
    **kwargs: dict[str, Any]
)

Feature cutout regularization layer.

This layer randomly masks out (sets to zero) a specified fraction of features during training to improve model robustness and prevent overfitting. During inference, all features are kept intact.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from keras import random
from kerasfactory.layers import FeatureCutout

# Create sample data
batch_size = 32
feature_dim = 10
inputs = random.normal((batch_size, feature_dim))

# Apply feature cutout
cutout = FeatureCutout(cutout_prob=0.2)
masked_outputs = cutout(inputs, training=True)

Initialize feature cutout.

Parameters:

Name Type Description Default
cutout_prob float

Probability of masking each feature

0.1
noise_value float

Value to use for masked features (default: 0.0)

0.0
seed int | None

Random seed for reproducibility

None
**kwargs dict[str, Any]

Additional layer arguments

{}

Raises:

Type Description
ValueError

If cutout_prob is not in [0, 1]

Source code in kerasfactory/layers/FeatureCutout.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def __init__(
    self,
    cutout_prob: float = 0.1,
    noise_value: float = 0.0,
    seed: int | None = None,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize feature cutout.

    Args:
        cutout_prob: Probability of masking each feature
        noise_value: Value to use for masked features (default: 0.0)
        seed: Random seed for reproducibility
        **kwargs: Additional layer arguments

    Raises:
        ValueError: If cutout_prob is not in [0, 1]
    """
    super().__init__(**kwargs)

    if not 0 <= cutout_prob <= 1:
        raise ValueError(f"cutout_prob must be in [0, 1], got {cutout_prob}")

    self.cutout_prob = cutout_prob
    self.noise_value = noise_value
    self.seed = seed

    # Create random generator with fixed seed
    self._rng = random.SeedGenerator(seed) if seed else None
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute output shape.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Input shape tuple

required

Returns:

Type Description
tuple[int, ...]

Output shape tuple

Source code in kerasfactory/layers/FeatureCutout.py
106
107
108
109
110
111
112
113
114
115
116
117
118
def compute_output_shape(
    self,
    input_shape: tuple[int, ...],
) -> tuple[int, ...]:
    """Compute output shape.

    Args:
        input_shape: Input shape tuple

    Returns:
        Output shape tuple
    """
    return input_shape
from_config classmethod
1
from_config(config: dict[str, Any]) -> FeatureCutout

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
FeatureCutout

FeatureCutout instance

Source code in kerasfactory/layers/FeatureCutout.py
136
137
138
139
140
141
142
143
144
145
146
@classmethod
def from_config(cls, config: dict[str, Any]) -> "FeatureCutout":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        FeatureCutout instance
    """
    return cls(**config)

๐Ÿ”ง Parameters Deep Dive

cutout_prob (float)

  • Purpose: Probability of masking each feature
  • Range: 0.0 to 1.0 (typically 0.05-0.3)
  • Impact: Higher values = more aggressive regularization
  • Recommendation: Start with 0.1-0.2, adjust based on overfitting

noise_value (float)

  • Purpose: Value to use for masked features
  • Default: 0.0
  • Impact: Affects how masked features are represented
  • Recommendation: Use 0.0 for most cases, -1.0 for normalized data

seed (int, optional)

  • Purpose: Random seed for reproducibility
  • Default: None (random)
  • Impact: Controls randomness of masking
  • Recommendation: Use fixed seed for reproducible experiments

๐Ÿ“ˆ Performance Characteristics

  • Speed: โšกโšกโšกโšก Very fast - simple masking operation
  • Memory: ๐Ÿ’พ Low memory usage - no additional parameters
  • Accuracy: ๐ŸŽฏ๐ŸŽฏ๐ŸŽฏ Good for preventing overfitting
  • Best For: Tabular data where overfitting is a concern

๐ŸŽจ Examples

Example 1: Overfitting Prevention

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import keras
import numpy as np
from kerasfactory.layers import FeatureCutout

# Create a model prone to overfitting
def create_overfitting_model():
    inputs = keras.Input(shape=(100,))  # High-dimensional input

    # Apply feature cutout to prevent overfitting
    x = FeatureCutout(cutout_prob=0.2)(inputs)

    # Deep network
    x = keras.layers.Dense(256, activation='relu')(x)
    x = keras.layers.Dropout(0.3)(x)
    x = keras.layers.Dense(128, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)
    x = keras.layers.Dense(64, activation='relu')(x)
    x = keras.layers.Dropout(0.1)(x)
    x = keras.layers.Dense(32, activation='relu')(x)

    # Output
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    return keras.Model(inputs, outputs)

model = create_overfitting_model()
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train with feature cutout
# model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=100)

Example 2: Progressive Feature Cutout

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Apply different cutout probabilities to different feature groups
def create_progressive_cutout_model():
    inputs = keras.Input(shape=(30,))

    # Split features into groups
    important_features = inputs[:, :10]    # First 10 features (important)
    regular_features = inputs[:, 10:25]    # Next 15 features (regular)
    noisy_features = inputs[:, 25:30]      # Last 5 features (potentially noisy)

    # Apply different cutout probabilities
    important_cutout = FeatureCutout(cutout_prob=0.05)(important_features)  # Low cutout
    regular_cutout = FeatureCutout(cutout_prob=0.15)(regular_features)      # Medium cutout
    noisy_cutout = FeatureCutout(cutout_prob=0.3)(noisy_features)           # High cutout

    # Combine features
    x = keras.layers.Concatenate()([important_cutout, regular_cutout, noisy_cutout])

    # Process combined features
    x = keras.layers.Dense(64, activation='relu')(x)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.Dropout(0.2)(x)
    x = keras.layers.Dense(32, activation='relu')(x)
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    return keras.Model(inputs, outputs)

model = create_progressive_cutout_model()
model.compile(optimizer='adam', loss='binary_crossentropy')

Example 3: Feature Importance Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Analyze which features are most affected by cutout
def analyze_feature_importance(model, test_data, feature_names=None):
    """Analyze feature importance by measuring performance drop when features are cut out."""
    # Get baseline performance
    baseline_pred = model(test_data)

    # Test each feature individually
    feature_importance = []

    for i in range(test_data.shape[1]):
        # Create copy of test data
        test_copy = test_data.numpy().copy()

        # Mask feature i
        test_copy[:, i] = 0.0

        # Get prediction with masked feature
        masked_pred = model(keras.ops.convert_to_tensor(test_copy))

        # Calculate importance as difference in prediction
        importance = keras.ops.mean(keras.ops.abs(baseline_pred - masked_pred))
        feature_importance.append(importance.numpy())

    # Sort features by importance
    sorted_indices = np.argsort(feature_importance)[::-1]

    print("Feature importance (based on cutout sensitivity):")
    for i, idx in enumerate(sorted_indices[:10]):  # Top 10 features
        feature_name = feature_names[idx] if feature_names else f"Feature_{idx}"
        print(f"{i+1}. {feature_name}: {feature_importance[idx]:.4f}")

    return feature_importance

# Use with your model
# importance_scores = analyze_feature_importance(model, test_data, feature_names)

๐Ÿ’ก Tips & Best Practices

  • Cutout Probability: Start with 0.1-0.2, increase if overfitting persists
  • Feature Groups: Apply different cutout probabilities to different feature types
  • Seed Setting: Use fixed seed for reproducible experiments
  • Noise Value: Choose noise value based on your data distribution
  • Monitoring: Track validation performance to tune cutout probability
  • Combination: Use with other regularization techniques (dropout, batch norm)

โš ๏ธ Common Pitfalls

  • Input Shape: Must be 2D tensor (batch_size, feature_dim)
  • Training Mode: Only applies masking during training
  • Over-regularization: Too high cutout_prob can hurt performance
  • Feature Dependencies: May not work well if features are highly correlated
  • Memory Usage: Creates temporary masks during training

๐Ÿ“š Further Reading