Skip to content

πŸš€ BoostingBlock

πŸš€ BoostingBlock

πŸ”΄ Advanced βœ… Stable πŸ”₯ Popular

🎯 Overview

The BoostingBlock simulates gradient boosting behavior in a neural network by computing a correction term via a configurable MLP and adding a scaled version to the input. This layer implements a weak learner that can be stacked to mimic the iterative residual-correction process of gradient boosting.

This layer is particularly powerful for tabular data where gradient boosting techniques are effective, allowing you to combine the benefits of neural networks with boosting algorithms.

πŸ” How It Works

The BoostingBlock processes data through a boosting-inspired transformation:

  1. MLP Processing: Applies a configurable MLP to the input
  2. Correction Computation: Computes a correction term from the MLP output
  3. Scaling: Applies a learnable or fixed scaling factor (gamma)
  4. Residual Addition: Adds the scaled correction to the original input
  5. Output Generation: Produces the boosted output
graph TD
    A[Input Features] --> B[MLP Processing]
    B --> C[Correction Term]
    C --> D[Gamma Scaling]
    D --> E[Scaled Correction]
    A --> F[Residual Addition]
    E --> F
    F --> G[Boosted Output]

    H[Learnable Gamma] --> D

    style A fill:#e6f3ff,stroke:#4a86e8
    style G fill:#e8f5e9,stroke:#66bb6a
    style B fill:#fff9e6,stroke:#ffb74d
    style C fill:#f3e5f5,stroke:#9c27b0
    style F fill:#e1f5fe,stroke:#03a9f4

πŸ’‘ Why Use This Layer?

Challenge Traditional Approach BoostingBlock's Solution
Gradient Boosting Separate boosting algorithms 🎯 Neural network implementation of boosting
Residual Learning Manual residual computation ⚑ Automatic residual correction learning
Weak Learners Separate weak learner models 🧠 Integrated weak learners in neural networks
Ensemble Learning External ensemble methods πŸ”— End-to-end ensemble learning

πŸ“Š Use Cases

  • Tabular Data: Combining neural networks with boosting techniques
  • Residual Learning: Learning residual corrections iteratively
  • Ensemble Methods: Building ensemble models in neural networks
  • Gradient Boosting: Implementing boosting algorithms in neural networks
  • Weak Learners: Creating weak learners for ensemble methods

πŸš€ Quick Start

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
from kerasfactory.layers import BoostingBlock

# Create sample input data
batch_size, input_dim = 32, 16
x = keras.random.normal((batch_size, input_dim))

# Apply boosting block
boosting_block = BoostingBlock(hidden_units=64)
output = boosting_block(x)

print(f"Input shape: {x.shape}")           # (32, 16)
print(f"Output shape: {output.shape}")     # (32, 16)

In a Sequential Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import BoostingBlock

model = keras.Sequential([
    keras.layers.Dense(32, activation='relu'),
    BoostingBlock(hidden_units=64),
    BoostingBlock(hidden_units=32),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
from kerasfactory.layers import BoostingBlock

# Define inputs
inputs = keras.Input(shape=(20,))  # 20 features

# Apply boosting blocks
x = BoostingBlock(hidden_units=64)(inputs)
x = BoostingBlock(hidden_units=32)(x)
x = keras.layers.Dense(16, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Advanced configuration with multiple boosting blocks
def create_boosting_network():
    inputs = keras.Input(shape=(30,))

    # Multiple boosting blocks with different configurations
    x = BoostingBlock(
        hidden_units=[64, 32],  # Two hidden layers
        hidden_activation='selu',
        dropout_rate=0.1,
        gamma_trainable=True
    )(inputs)

    x = BoostingBlock(
        hidden_units=32,
        hidden_activation='relu',
        dropout_rate=0.1,
        gamma_trainable=True
    )(x)

    x = BoostingBlock(
        hidden_units=16,
        hidden_activation='tanh',
        dropout_rate=0.05,
        gamma_trainable=False
    )(x)

    # Final processing
    x = keras.layers.Dense(8, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)

    # Multi-task output
    classification = keras.layers.Dense(3, activation='softmax', name='classification')(x)
    regression = keras.layers.Dense(1, name='regression')(x)

    return keras.Model(inputs, [classification, regression])

model = create_boosting_network()
model.compile(
    optimizer='adam',
    loss={'classification': 'categorical_crossentropy', 'regression': 'mse'},
    loss_weights={'classification': 1.0, 'regression': 0.5}
)

πŸ“– API Reference

kerasfactory.layers.BoostingBlock

This module implements a BoostingBlock layer that simulates gradient boosting behavior in a neural network. The layer computes a correction term via a configurable MLP and adds a scaled version to the input.

Classes

BoostingBlock
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
BoostingBlock(
    hidden_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    gamma_initializer: str
    | initializers.Initializer = "ones",
    use_bias: bool = True,
    kernel_initializer: str
    | initializers.Initializer = "glorot_uniform",
    bias_initializer: str
    | initializers.Initializer = "zeros",
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any
)

A neural network layer that simulates gradient boosting behavior.

This layer implements a weak learner that computes a correction term via a configurable MLP and adds a scaled version of this correction to the input. Stacking several such blocks can mimic the iterative residual-correction process of gradient boosting.

The output is computed as

output = inputs + gamma * f(inputs)

where: - f is a configurable MLP (default: two-layer network) - gamma is a learnable or fixed scaling factor

Parameters:

Name Type Description Default
hidden_units int | list[int]

Number of units in the hidden layer(s). Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64.

64
hidden_activation str

Activation function for hidden layers. Default is 'relu'.

'relu'
output_activation str | None

Activation function for the output layer. Default is None.

None
gamma_trainable bool

Whether the scaling factor gamma is trainable. Default is True.

True
gamma_initializer str | Initializer

Initializer for the gamma scaling factor. Default is 'ones'.

'ones'
use_bias bool

Whether to include bias terms in the dense layers. Default is True.

True
kernel_initializer str | Initializer

Initializer for the dense layer kernels. Default is 'glorot_uniform'.

'glorot_uniform'
bias_initializer str | Initializer

Initializer for the dense layer biases. Default is 'zeros'.

'zeros'
dropout_rate float | None

Optional dropout rate to apply after hidden layers. Default is None.

None
name str | None

Optional name for the layer.

None
Input shape

N-D tensor with shape: (batch_size, ..., input_dim)

Output shape

Same shape as input: (batch_size, ..., input_dim)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import tensorflow as tf
from kerasfactory.layers import BoostingBlock

# Create sample input data
x = tf.random.normal((32, 16))  # 32 samples, 16 features

# Basic usage
block = BoostingBlock(hidden_units=64)
y = block(x)
print("Output shape:", y.shape)  # (32, 16)

# Advanced configuration
block = BoostingBlock(
    hidden_units=[32, 16],  # Two hidden layers
    hidden_activation='selu',
    dropout_rate=0.1,
    gamma_trainable=False
)
y = block(x)

Initialize the BoostingBlock layer.

Parameters:

Name Type Description Default
hidden_units int | list[int]

Number of hidden units or list of units per layer.

64
hidden_activation str

Activation function for hidden layers.

'relu'
output_activation str | None

Activation function for output layer.

None
gamma_trainable bool

Whether gamma parameter is trainable.

True
gamma_initializer str | Initializer

Initializer for gamma parameter.

'ones'
use_bias bool

Whether to use bias.

True
kernel_initializer str | Initializer

Initializer for kernel weights.

'glorot_uniform'
bias_initializer str | Initializer

Initializer for bias weights.

'zeros'
dropout_rate float | None

Dropout rate.

None
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/BoostingBlock.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def __init__(
    self,
    hidden_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    gamma_initializer: str | initializers.Initializer = "ones",
    use_bias: bool = True,
    kernel_initializer: str | initializers.Initializer = "glorot_uniform",
    bias_initializer: str | initializers.Initializer = "zeros",
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the BoostingBlock layer.

    Args:
        hidden_units: Number of hidden units or list of units per layer.
        hidden_activation: Activation function for hidden layers.
        output_activation: Activation function for output layer.
        gamma_trainable: Whether gamma parameter is trainable.
        gamma_initializer: Initializer for gamma parameter.
        use_bias: Whether to use bias.
        kernel_initializer: Initializer for kernel weights.
        bias_initializer: Initializer for bias weights.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set attributes before calling parent's __init__
    self._hidden_units = (
        [hidden_units] if isinstance(hidden_units, int) else hidden_units
    )
    self._hidden_activation = hidden_activation
    self._output_activation = output_activation
    self._gamma_trainable = gamma_trainable
    self._gamma_initializer = initializers.get(gamma_initializer)
    self._use_bias = use_bias
    self._kernel_initializer = initializers.get(kernel_initializer)
    self._bias_initializer = initializers.get(bias_initializer)
    self._dropout_rate = dropout_rate

    # Validate parameters
    if any(units <= 0 for units in self._hidden_units):
        raise ValueError("All hidden_units must be positive integers")
    if dropout_rate is not None and not 0 <= dropout_rate < 1:
        raise ValueError("dropout_rate must be between 0 and 1")

    super().__init__(name=name, **kwargs)

    # Now set public attributes
    self.hidden_units = self._hidden_units
    self.hidden_activation = self._hidden_activation
    self.output_activation = self._output_activation
    self.gamma_trainable = self._gamma_trainable
    self.gamma_initializer = self._gamma_initializer
    self.use_bias = self._use_bias
    self.kernel_initializer = self._kernel_initializer
    self.bias_initializer = self._bias_initializer
    self.dropout_rate = self._dropout_rate

πŸ”§ Parameters Deep Dive

hidden_units (int or list)

  • Purpose: Number of hidden units in the MLP
  • Range: 8 to 256+ (typically 32-128)
  • Impact: Larger values = more complex corrections
  • Recommendation: Start with 64, scale based on data complexity

hidden_activation (str)

  • Purpose: Activation function for hidden layers
  • Options: 'relu', 'selu', 'tanh', 'sigmoid', etc.
  • Default: 'relu'
  • Impact: Affects the correction term computation
  • Recommendation: Use 'relu' for most cases, 'selu' for deeper networks

gamma_trainable (bool)

  • Purpose: Whether the scaling factor is trainable
  • Default: True
  • Impact: Trainable gamma allows learning optimal scaling
  • Recommendation: Use True for most cases, False for fixed scaling

dropout_rate (float, optional)

  • Purpose: Dropout rate for regularization
  • Range: 0.0 to 0.5 (typically 0.1-0.2)
  • Impact: Higher values = more regularization
  • Recommendation: Use 0.1-0.2 for regularization

πŸ“ˆ Performance Characteristics

  • Speed: ⚑⚑⚑ Fast - simple MLP computation
  • Memory: πŸ’ΎπŸ’Ύ Moderate memory usage due to MLP
  • Accuracy: 🎯🎯🎯🎯 Excellent for residual learning
  • Best For: Tabular data where boosting techniques are effective

🎨 Examples

Example 1: Gradient Boosting Simulation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import keras
import numpy as np
from kerasfactory.layers import BoostingBlock

# Create a gradient boosting simulation
def create_gradient_boosting_simulation():
    inputs = keras.Input(shape=(25,))

    # Multiple boosting blocks to simulate gradient boosting
    x = BoostingBlock(hidden_units=64, gamma_trainable=True)(inputs)
    x = BoostingBlock(hidden_units=64, gamma_trainable=True)(x)
    x = BoostingBlock(hidden_units=32, gamma_trainable=True)(x)
    x = BoostingBlock(hidden_units=32, gamma_trainable=True)(x)
    x = BoostingBlock(hidden_units=16, gamma_trainable=True)(x)

    # Final processing
    x = keras.layers.Dense(8, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)

    # Output
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    return keras.Model(inputs, outputs)

model = create_gradient_boosting_simulation()
model.compile(optimizer='adam', loss='binary_crossentropy')

# Test with sample data
sample_data = keras.random.normal((100, 25))
predictions = model(sample_data)
print(f"Gradient boosting simulation predictions shape: {predictions.shape}")

Example 2: Residual Learning Analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Analyze residual learning in boosting blocks
def analyze_residual_learning():
    # Create model with boosting blocks
    inputs = keras.Input(shape=(15,))
    x = BoostingBlock(hidden_units=32, gamma_trainable=True)(inputs)
    x = BoostingBlock(hidden_units=16, gamma_trainable=True)(x)
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    model = keras.Model(inputs, outputs)

    # Test with different input patterns
    test_inputs = [
        keras.random.normal((10, 15)),  # Random data
        keras.random.normal((10, 15)) * 2,  # Scaled data
        keras.random.normal((10, 15)) + 1,  # Shifted data
    ]

    print("Residual Learning Analysis:")
    print("=" * 40)

    for i, test_input in enumerate(test_inputs):
        prediction = model(test_input)
        print(f"Test {i+1}: Prediction mean = {keras.ops.mean(prediction):.4f}")

    return model

# Analyze residual learning
# model = analyze_residual_learning()

Example 3: Boosting Block Comparison

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Compare different boosting block configurations
def compare_boosting_configurations():
    inputs = keras.Input(shape=(20,))

    # Configuration 1: Single hidden layer
    x1 = BoostingBlock(hidden_units=64, gamma_trainable=True)(inputs)
    x1 = keras.layers.Dense(1, activation='sigmoid')(x1)
    model1 = keras.Model(inputs, x1)

    # Configuration 2: Multiple hidden layers
    x2 = BoostingBlock(hidden_units=[64, 32], gamma_trainable=True)(inputs)
    x2 = keras.layers.Dense(1, activation='sigmoid')(x2)
    model2 = keras.Model(inputs, x2)

    # Configuration 3: Fixed gamma
    x3 = BoostingBlock(hidden_units=64, gamma_trainable=False)(inputs)
    x3 = keras.layers.Dense(1, activation='sigmoid')(x3)
    model3 = keras.Model(inputs, x3)

    # Test with sample data
    test_data = keras.random.normal((50, 20))

    print("Boosting Block Comparison:")
    print("=" * 40)
    print(f"Single hidden layer: {model1.count_params()} parameters")
    print(f"Multiple hidden layers: {model2.count_params()} parameters")
    print(f"Fixed gamma: {model3.count_params()} parameters")

    return model1, model2, model3

# Compare configurations
# models = compare_boosting_configurations()

πŸ’‘ Tips & Best Practices

  • Hidden Units: Start with 64 units, scale based on data complexity
  • Gamma Training: Use trainable gamma for most applications
  • Activation Functions: Use 'relu' for most cases, 'selu' for deeper networks
  • Dropout: Use 0.1-0.2 dropout rate for regularization
  • Stacking: Stack multiple boosting blocks for better performance
  • Residual Learning: The layer automatically handles residual learning

⚠️ Common Pitfalls

  • Hidden Units: Must be positive integer or list of positive integers
  • Gamma Training: Fixed gamma may limit learning capacity
  • Overfitting: Monitor for overfitting with complex configurations
  • Memory Usage: Scales with hidden units and number of layers
  • Gradient Flow: Residual connections help but monitor training

πŸ“š Further Reading