π BoostingBlock
π BoostingBlock
π― Overview
The BoostingBlock simulates gradient boosting behavior in a neural network by computing a correction term via a configurable MLP and adding a scaled version to the input. This layer implements a weak learner that can be stacked to mimic the iterative residual-correction process of gradient boosting.
This layer is particularly powerful for tabular data where gradient boosting techniques are effective, allowing you to combine the benefits of neural networks with boosting algorithms.
π How It Works
The BoostingBlock processes data through a boosting-inspired transformation:
- MLP Processing: Applies a configurable MLP to the input
- Correction Computation: Computes a correction term from the MLP output
- Scaling: Applies a learnable or fixed scaling factor (gamma)
- Residual Addition: Adds the scaled correction to the original input
- Output Generation: Produces the boosted output
graph TD
A[Input Features] --> B[MLP Processing]
B --> C[Correction Term]
C --> D[Gamma Scaling]
D --> E[Scaled Correction]
A --> F[Residual Addition]
E --> F
F --> G[Boosted Output]
H[Learnable Gamma] --> D
style A fill:#e6f3ff,stroke:#4a86e8
style G fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#f3e5f5,stroke:#9c27b0
style F fill:#e1f5fe,stroke:#03a9f4
π‘ Why Use This Layer?
| Challenge | Traditional Approach | BoostingBlock's Solution |
|---|---|---|
| Gradient Boosting | Separate boosting algorithms | π― Neural network implementation of boosting |
| Residual Learning | Manual residual computation | β‘ Automatic residual correction learning |
| Weak Learners | Separate weak learner models | π§ Integrated weak learners in neural networks |
| Ensemble Learning | External ensemble methods | π End-to-end ensemble learning |
π Use Cases
- Tabular Data: Combining neural networks with boosting techniques
- Residual Learning: Learning residual corrections iteratively
- Ensemble Methods: Building ensemble models in neural networks
- Gradient Boosting: Implementing boosting algorithms in neural networks
- Weak Learners: Creating weak learners for ensemble methods
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | |
π API Reference
kerasfactory.layers.BoostingBlock
This module implements a BoostingBlock layer that simulates gradient boosting behavior in a neural network. The layer computes a correction term via a configurable MLP and adds a scaled version to the input.
Classes
BoostingBlock
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
A neural network layer that simulates gradient boosting behavior.
This layer implements a weak learner that computes a correction term via a configurable MLP and adds a scaled version of this correction to the input. Stacking several such blocks can mimic the iterative residual-correction process of gradient boosting.
The output is computed as
output = inputs + gamma * f(inputs)
where: - f is a configurable MLP (default: two-layer network) - gamma is a learnable or fixed scaling factor
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_units |
int | list[int]
|
Number of units in the hidden layer(s). Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64. |
64
|
hidden_activation |
str
|
Activation function for hidden layers. Default is 'relu'. |
'relu'
|
output_activation |
str | None
|
Activation function for the output layer. Default is None. |
None
|
gamma_trainable |
bool
|
Whether the scaling factor gamma is trainable. Default is True. |
True
|
gamma_initializer |
str | Initializer
|
Initializer for the gamma scaling factor. Default is 'ones'. |
'ones'
|
use_bias |
bool
|
Whether to include bias terms in the dense layers. Default is True. |
True
|
kernel_initializer |
str | Initializer
|
Initializer for the dense layer kernels. Default is 'glorot_uniform'. |
'glorot_uniform'
|
bias_initializer |
str | Initializer
|
Initializer for the dense layer biases. Default is 'zeros'. |
'zeros'
|
dropout_rate |
float | None
|
Optional dropout rate to apply after hidden layers. Default is None. |
None
|
name |
str | None
|
Optional name for the layer. |
None
|
Input shape
N-D tensor with shape: (batch_size, ..., input_dim)
Output shape
Same shape as input: (batch_size, ..., input_dim)
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Initialize the BoostingBlock layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_units |
int | list[int]
|
Number of hidden units or list of units per layer. |
64
|
hidden_activation |
str
|
Activation function for hidden layers. |
'relu'
|
output_activation |
str | None
|
Activation function for output layer. |
None
|
gamma_trainable |
bool
|
Whether gamma parameter is trainable. |
True
|
gamma_initializer |
str | Initializer
|
Initializer for gamma parameter. |
'ones'
|
use_bias |
bool
|
Whether to use bias. |
True
|
kernel_initializer |
str | Initializer
|
Initializer for kernel weights. |
'glorot_uniform'
|
bias_initializer |
str | Initializer
|
Initializer for bias weights. |
'zeros'
|
dropout_rate |
float | None
|
Dropout rate. |
None
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/BoostingBlock.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
π§ Parameters Deep Dive
hidden_units (int or list)
- Purpose: Number of hidden units in the MLP
- Range: 8 to 256+ (typically 32-128)
- Impact: Larger values = more complex corrections
- Recommendation: Start with 64, scale based on data complexity
hidden_activation (str)
- Purpose: Activation function for hidden layers
- Options: 'relu', 'selu', 'tanh', 'sigmoid', etc.
- Default: 'relu'
- Impact: Affects the correction term computation
- Recommendation: Use 'relu' for most cases, 'selu' for deeper networks
gamma_trainable (bool)
- Purpose: Whether the scaling factor is trainable
- Default: True
- Impact: Trainable gamma allows learning optimal scaling
- Recommendation: Use True for most cases, False for fixed scaling
dropout_rate (float, optional)
- Purpose: Dropout rate for regularization
- Range: 0.0 to 0.5 (typically 0.1-0.2)
- Impact: Higher values = more regularization
- Recommendation: Use 0.1-0.2 for regularization
π Performance Characteristics
- Speed: β‘β‘β‘ Fast - simple MLP computation
- Memory: πΎπΎ Moderate memory usage due to MLP
- Accuracy: π―π―π―π― Excellent for residual learning
- Best For: Tabular data where boosting techniques are effective
π¨ Examples
Example 1: Gradient Boosting Simulation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
Example 2: Residual Learning Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
Example 3: Boosting Block Comparison
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
π‘ Tips & Best Practices
- Hidden Units: Start with 64 units, scale based on data complexity
- Gamma Training: Use trainable gamma for most applications
- Activation Functions: Use 'relu' for most cases, 'selu' for deeper networks
- Dropout: Use 0.1-0.2 dropout rate for regularization
- Stacking: Stack multiple boosting blocks for better performance
- Residual Learning: The layer automatically handles residual learning
β οΈ Common Pitfalls
- Hidden Units: Must be positive integer or list of positive integers
- Gamma Training: Fixed gamma may limit learning capacity
- Overfitting: Monitor for overfitting with complex configurations
- Memory Usage: Scales with hidden units and number of layers
- Gradient Flow: Residual connections help but monitor training
π Related Layers
- BoostingEnsembleLayer - Ensemble of boosting blocks
- GatedResidualNetwork - Gated residual networks
- StochasticDepth - Stochastic depth regularization
- VariableSelection - Variable selection with GRN
π Further Reading
- Gradient Boosting - Gradient boosting concepts
- Residual Learning - Residual learning paper
- Ensemble Methods - Ensemble learning concepts
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering