🎲 StochasticDepth

🔴 Advanced ✅ Stable 🔥 Popular

🎯 Overview

The StochasticDepth layer randomly drops entire residual branches with a specified probability during training, helping reduce overfitting and training time in deep networks. During inference, all branches are kept and scaled appropriately.

This layer is particularly powerful for deep neural networks where overfitting is a concern, providing a regularization technique that's specifically designed for residual architectures.

🔍 How It Works

The StochasticDepth layer processes data through stochastic branch dropping:

Training Mode: Randomly drops residual branches based on survival probability
Inference Mode: Keeps all branches and scales by survival probability
Random Generation: Uses random number generation for branch selection
Scaling: Applies appropriate scaling for inference
Output Generation: Produces regularized output

graph TD
    A[Input Features] --> B{Training Mode?}
    B -->|Yes| C[Random Branch Selection]
    B -->|No| D[Scale by Survival Probability]

    C --> E[Drop Residual Branch]
    C --> F[Keep Residual Branch]

    E --> G[Output = Shortcut]
    F --> H[Output = Shortcut + Residual]
    D --> I[Output = Shortcut + (Survival Prob × Residual)]

    G --> J[Final Output]
    H --> J
    I --> J

    style A fill:#e6f3ff,stroke:#4a86e8
    style J fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style C fill:#f3e5f5,stroke:#9c27b0
    style D fill:#e1f5fe,stroke:#03a9f4

💡 Why Use This Layer?

Challenge	Traditional Approach	StochasticDepth's Solution
Overfitting	Dropout on individual neurons	🎯 Branch-level dropout for better regularization
Deep Networks	Limited depth due to overfitting	⚡ Enables deeper networks with regularization
Training Time	Slower training with deep networks	🧠 Faster training by dropping branches
Residual Networks	Standard dropout not optimal	🔗 Designed for residual architectures

📊 Use Cases

Deep Neural Networks: Regularizing very deep networks
Residual Architectures: Optimizing residual network training
Overfitting Prevention: Reducing overfitting in complex models
Training Acceleration: Faster training through branch dropping
Ensemble Learning: Creating diverse network behaviors

🚀 Quick Start

Basic Usage

import keras
from kerasfactory.layers import StochasticDepth

# Create sample residual branch
inputs = keras.random.normal((32, 64, 64, 128))
residual = keras.layers.Conv2D(128, 3, padding="same")(inputs)
residual = keras.layers.BatchNormalization()(residual)
residual = keras.layers.ReLU()(residual)

# Apply stochastic depth
stochastic_depth = StochasticDepth(survival_prob=0.8)
output = stochastic_depth([inputs, residual])

print(f"Input shape: {inputs.shape}")      # (32, 64, 64, 128)
print(f"Output shape: {output.shape}")     # (32, 64, 64, 128)

In a Sequential Model

import keras
from kerasfactory.layers import StochasticDepth

# Create a residual block with stochastic depth
def create_residual_block(inputs, filters, survival_prob=0.8):
    # Shortcut connection
    shortcut = inputs

    # Residual branch
    x = keras.layers.Conv2D(filters, 3, padding="same")(inputs)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.ReLU()(x)
    x = keras.layers.Conv2D(filters, 3, padding="same")(x)
    x = keras.layers.BatchNormalization()(x)

    # Apply stochastic depth
    x = StochasticDepth(survival_prob=survival_prob)([shortcut, x])
    x = keras.layers.ReLU()(x)

    return x

# Build model with stochastic depth
inputs = keras.Input(shape=(32, 32, 3))
x = keras.layers.Conv2D(64, 3, padding="same")(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.ReLU()(x)

# Add residual blocks with stochastic depth
x = create_residual_block(x, 64, survival_prob=0.9)
x = create_residual_block(x, 64, survival_prob=0.8)
x = create_residual_block(x, 64, survival_prob=0.7)

# Final layers
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs, x)

In a Functional Model

import keras
from kerasfactory.layers import StochasticDepth

# Define inputs
inputs = keras.Input(shape=(28, 28, 3))

# Initial processing
x = keras.layers.Conv2D(32, 3, padding="same")(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.ReLU()(x)

# Residual block with stochastic depth
shortcut = x
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.ReLU()(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.BatchNormalization()(x)

# Apply stochastic depth
x = StochasticDepth(survival_prob=0.8)([shortcut, x])
x = keras.layers.ReLU()(x)

# Final processing
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs, x)

Advanced Configuration

# Advanced configuration with progressive stochastic depth
def create_progressive_stochastic_model():
    inputs = keras.Input(shape=(32, 32, 3))

    # Initial processing
    x = keras.layers.Conv2D(64, 3, padding="same")(inputs)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.ReLU()(x)

    # Progressive stochastic depth (decreasing survival probability)
    survival_probs = [0.9, 0.8, 0.7, 0.6, 0.5]

    for i, survival_prob in enumerate(survival_probs):
        shortcut = x
        x = keras.layers.Conv2D(64, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)
        x = keras.layers.ReLU()(x)
        x = keras.layers.Conv2D(64, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        # Apply stochastic depth with decreasing survival probability
        x = StochasticDepth(survival_prob=survival_prob, seed=42)([shortcut, x])
        x = keras.layers.ReLU()(x)

    # Final processing
    x = keras.layers.GlobalAveragePooling2D()(x)
    x = keras.layers.Dense(100, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)
    x = keras.layers.Dense(10, activation='softmax')(x)

    return keras.Model(inputs, x)

model = create_progressive_stochastic_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

📖 API Reference

kerasfactory.layers.StochasticDepth

Stochastic depth layer for neural networks.

Classes

StochasticDepth

StochasticDepth(
    survival_prob: float = 0.5,
    seed: int | None = None,
    **kwargs: dict[str, Any]
)

Stochastic depth layer for regularization.

This layer randomly drops entire residual branches with a specified probability during training. During inference, all branches are kept and scaled appropriately. This technique helps reduce overfitting and training time in deep networks.

Reference

Deep Networks with Stochastic Depth

Example

from keras import random, layers
from kerasfactory.layers import StochasticDepth

# Create sample residual branch
inputs = random.normal((32, 64, 64, 128))
residual = layers.Conv2D(128, 3, padding="same")(inputs)
residual = layers.BatchNormalization()(residual)
residual = layers.ReLU()(residual)

# Apply stochastic depth
outputs = StochasticDepth(survival_prob=0.8)([inputs, residual])

Initialize stochastic depth.

Parameters:

Name	Type	Description	Default
`survival_prob`	`float`	Probability of keeping the residual branch (default: 0.5)	`0.5`
`seed`	`int \| None`	Random seed for reproducibility	`None`
`**kwargs`	`dict[str, Any]`	Additional layer arguments	`{}`

Raises:

Type	Description
`ValueError`	If survival_prob is not in [0, 1]

Source code in kerasfactory/layers/StochasticDepth.py

def __init__(
    self,
    survival_prob: float = 0.5,
    seed: int | None = None,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize stochastic depth.

    Args:
        survival_prob: Probability of keeping the residual branch (default: 0.5)
        seed: Random seed for reproducibility
        **kwargs: Additional layer arguments

    Raises:
        ValueError: If survival_prob is not in [0, 1]
    """
    super().__init__(**kwargs)

    if not 0 <= survival_prob <= 1:
        raise ValueError(f"survival_prob must be in [0, 1], got {survival_prob}")

    self.survival_prob = survival_prob
    self.seed = seed

    # Create random generator with fixed seed
    self._rng = random.SeedGenerator(seed) if seed else None

Functions

compute_output_shape

compute_output_shape(
    input_shape: list[tuple[int, ...]]
) -> tuple[int, ...]

Compute output shape.

Parameters:

Name	Type	Description	Default
`input_shape`	`list[tuple[int, ...]]`	List of input shape tuples	required

Returns:

Type	Description
`tuple[int, ...]`	Output shape tuple

Source code in kerasfactory/layers/StochasticDepth.py

def compute_output_shape(
    self,
    input_shape: list[tuple[int, ...]],
) -> tuple[int, ...]:
    """Compute output shape.

    Args:
        input_shape: List of input shape tuples

    Returns:
        Output shape tuple
    """
    return input_shape[0]

from_config classmethod

from_config(config: dict[str, Any]) -> StochasticDepth

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`StochasticDepth`	StochasticDepth instance

Source code in kerasfactory/layers/StochasticDepth.py

@classmethod
def from_config(cls, config: dict[str, Any]) -> "StochasticDepth":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        StochasticDepth instance
    """
    return cls(**config)

🔧 Parameters Deep Dive

`survival_prob` (float)

Purpose: Probability of keeping the residual branch
Range: 0.0 to 1.0 (typically 0.5-0.9)
Impact: Higher values = less regularization, lower values = more regularization
Recommendation: Start with 0.8, adjust based on overfitting

`seed` (int, optional)

Purpose: Random seed for reproducibility
Default: None (random)
Impact: Controls randomness of branch dropping
Recommendation: Use fixed seed for reproducible experiments

📈 Performance Characteristics

Speed: ⚡⚡⚡⚡ Very fast - simple conditional logic
Memory: 💾 Low memory usage - no additional parameters
Accuracy: 🎯🎯🎯🎯 Excellent for deep network regularization
Best For: Deep residual networks where overfitting is a concern

🎨 Examples

Example 1: Deep Residual Network

import keras
import numpy as np
from kerasfactory.layers import StochasticDepth

# Create a deep residual network with stochastic depth
def create_deep_residual_network():
    inputs = keras.Input(shape=(32, 32, 3))

    # Initial processing
    x = keras.layers.Conv2D(64, 3, padding="same")(inputs)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.ReLU()(x)

    # Multiple residual blocks with stochastic depth
    for i in range(10):  # 10 residual blocks
        shortcut = x
        x = keras.layers.Conv2D(64, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)
        x = keras.layers.ReLU()(x)
        x = keras.layers.Conv2D(64, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        # Apply stochastic depth with decreasing survival probability
        survival_prob = 0.9 - (i * 0.05)  # Decrease from 0.9 to 0.45
        x = StochasticDepth(survival_prob=survival_prob)([shortcut, x])
        x = keras.layers.ReLU()(x)

    # Final processing
    x = keras.layers.GlobalAveragePooling2D()(x)
    x = keras.layers.Dense(100, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)
    x = keras.layers.Dense(10, activation='softmax')(x)

    return keras.Model(inputs, x)

model = create_deep_residual_network()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Test with sample data
sample_data = keras.random.normal((100, 32, 32, 3))
predictions = model(sample_data)
print(f"Deep residual network predictions shape: {predictions.shape}")

Example 2: Stochastic Depth Analysis

# Analyze stochastic depth behavior
def analyze_stochastic_depth():
    # Create model with stochastic depth
    inputs = keras.Input(shape=(16, 16, 64))
    shortcut = inputs
    residual = keras.layers.Conv2D(64, 3, padding="same")(inputs)
    residual = keras.layers.BatchNormalization()(residual)
    residual = keras.layers.ReLU()(residual)

    # Apply stochastic depth
    x = StochasticDepth(survival_prob=0.8, seed=42)([shortcut, residual])

    model = keras.Model(inputs, x)

    # Test with sample data
    test_data = keras.random.normal((10, 16, 16, 64))

    print("Stochastic Depth Analysis:")
    print("=" * 40)
    print(f"Input shape: {test_data.shape}")
    print(f"Output shape: {model(test_data).shape}")
    print(f"Model parameters: {model.count_params()}")

    return model

# Analyze stochastic depth
# model = analyze_stochastic_depth()

Example 3: Progressive Stochastic Depth

# Create model with progressive stochastic depth
def create_progressive_stochastic_model():
    inputs = keras.Input(shape=(28, 28, 3))

    # Initial processing
    x = keras.layers.Conv2D(32, 3, padding="same")(inputs)
    x = keras.layers.BatchNormalization()(x)
    x = keras.layers.ReLU()(x)

    # Progressive stochastic depth
    survival_probs = [0.9, 0.8, 0.7, 0.6, 0.5]

    for i, survival_prob in enumerate(survival_probs):
        shortcut = x
        x = keras.layers.Conv2D(32, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)
        x = keras.layers.ReLU()(x)
        x = keras.layers.Conv2D(32, 3, padding="same")(x)
        x = keras.layers.BatchNormalization()(x)

        # Apply stochastic depth
        x = StochasticDepth(survival_prob=survival_prob, seed=42)([shortcut, x])
        x = keras.layers.ReLU()(x)

    # Final processing
    x = keras.layers.GlobalAveragePooling2D()(x)
    x = keras.layers.Dense(10, activation='softmax')(x)

    return keras.Model(inputs, x)

model = create_progressive_stochastic_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

💡 Tips & Best Practices

Survival Probability: Start with 0.8, adjust based on overfitting
Progressive Depth: Use decreasing survival probability for deeper layers
Seed Setting: Use fixed seed for reproducible experiments
Residual Networks: Works best with residual architectures
Training Mode: Only applies during training, not inference
Scaling: Automatic scaling during inference

⚠️ Common Pitfalls

Input Format: Must be a list of [shortcut, residual] tensors
Survival Probability: Must be between 0 and 1
Training Mode: Only applies during training
Memory Usage: No additional memory overhead
Gradient Flow: May affect gradient flow during training

BoostingBlock - Boosting block with residual connections
GatedResidualNetwork - Gated residual networks
FeatureCutout - Feature regularization
BusinessRulesLayer - Business rules validation

📚 Further Reading

Deep Networks with Stochastic Depth - Original stochastic depth paper
Residual Networks - Residual network paper
Regularization Techniques - Regularization concepts
KerasFactory Layer Explorer - Browse all available layers
Feature Engineering Tutorial - Complete guide to feature engineering

🎲 StochasticDepth

🎲 StochasticDepth

🎯 Overview

🔍 How It Works

💡 Why Use This Layer?

📊 Use Cases

🚀 Quick Start

Basic Usage

In a Sequential Model

In a Functional Model

Advanced Configuration

📖 API Reference

kerasfactory.layers.StochasticDepth

Classes

StochasticDepth

Functions

🔧 Parameters Deep Dive

survival_prob (float)

seed (int, optional)

📈 Performance Characteristics

🎨 Examples

Example 1: Deep Residual Network

Example 2: Stochastic Depth Analysis

Example 3: Progressive Stochastic Depth

💡 Tips & Best Practices

⚠️ Common Pitfalls

🔗 Related Layers

📚 Further Reading

`survival_prob` (float)

`seed` (int, optional)