π― BoostingEnsembleLayer
π― BoostingEnsembleLayer
π― Overview
The BoostingEnsembleLayer aggregates multiple BoostingBlocks in parallel, combining their outputs via learnable weights to form an ensemble prediction. This layer implements ensemble learning in a differentiable, end-to-end manner, allowing multiple weak learners to work together.
This layer is particularly powerful for tabular data where ensemble methods are effective, providing a neural network implementation of boosting ensemble techniques.
π How It Works
The BoostingEnsembleLayer processes data through parallel boosting blocks:
- Parallel Processing: Creates multiple boosting blocks that process input independently
- Correction Computation: Each block computes its own correction term
- Gating Mechanism: Learns weights for combining block outputs
- Weighted Aggregation: Combines block outputs using learned weights
- Output Generation: Produces ensemble prediction
graph TD
A[Input Features] --> B1[Boosting Block 1]
A --> B2[Boosting Block 2]
A --> B3[Boosting Block N]
B1 --> C1[Correction 1]
B2 --> C2[Correction 2]
B3 --> C3[Correction N]
C1 --> D[Gating Mechanism]
C2 --> D
C3 --> D
D --> E[Learnable Weights]
E --> F[Weighted Aggregation]
F --> G[Ensemble Output]
style A fill:#e6f3ff,stroke:#4a86e8
style G fill:#e8f5e9,stroke:#66bb6a
style B1 fill:#fff9e6,stroke:#ffb74d
style B2 fill:#fff9e6,stroke:#ffb74d
style B3 fill:#fff9e6,stroke:#ffb74d
style D fill:#f3e5f5,stroke:#9c27b0
style F fill:#e1f5fe,stroke:#03a9f4
π‘ Why Use This Layer?
| Challenge | Traditional Approach | BoostingEnsembleLayer's Solution |
|---|---|---|
| Ensemble Learning | Separate ensemble models | π― Integrated ensemble in neural networks |
| Parallel Processing | Sequential boosting | β‘ Parallel boosting blocks |
| Weight Learning | Fixed ensemble weights | π§ Learnable weights for optimal combination |
| End-to-End Learning | Separate training phases | π End-to-end ensemble learning |
π Use Cases
- Ensemble Learning: Building ensemble models in neural networks
- Parallel Boosting: Implementing parallel boosting techniques
- Weak Learner Combination: Combining multiple weak learners
- Tabular Data: Effective for tabular data ensemble methods
- Robust Predictions: Creating more robust predictions through ensemble
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 12 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
π API Reference
kerasfactory.layers.BoostingEnsembleLayer
This module implements a BoostingEnsembleLayer that aggregates multiple BoostingBlocks in parallel. Their outputs are combined via learnable weights to form an ensemble prediction. This is similar in spirit to boosting ensembles but implemented in a differentiable, end-to-end manner.
Classes
BoostingEnsembleLayer
1 2 3 4 5 6 7 8 9 10 | |
Ensemble layer of boosting blocks for tabular data.
This layer aggregates multiple boosting blocks (weak learners) in parallel. Each learner produces a correction to the input. A gating mechanism (via learnable weights) then computes a weighted sum of the learners' outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_learners |
int
|
Number of boosting blocks in the ensemble. Default is 3. |
3
|
learner_units |
int | list[int]
|
Number of hidden units in each boosting block. Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64. |
64
|
hidden_activation |
str
|
Activation function for hidden layers in boosting blocks. Default is 'relu'. |
'relu'
|
output_activation |
str | None
|
Activation function for the output layer in boosting blocks. Default is None. |
None
|
gamma_trainable |
bool
|
Whether the scaling factor gamma in boosting blocks is trainable. Default is True. |
True
|
dropout_rate |
float | None
|
Optional dropout rate to apply in boosting blocks. Default is None. |
None
|
name |
str | None
|
Optional name for the layer. |
None
|
Input shape
N-D tensor with shape: (batch_size, ..., input_dim)
Output shape
Same shape as input: (batch_size, ..., input_dim)
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Initialize the BoostingEnsembleLayer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_learners |
int
|
Number of boosting learners. |
3
|
learner_units |
int | list[int]
|
Number of units per learner or list of units. |
64
|
hidden_activation |
str
|
Activation function for hidden layers. |
'relu'
|
output_activation |
str | None
|
Activation function for output layer. |
None
|
gamma_trainable |
bool
|
Whether gamma parameter is trainable. |
True
|
dropout_rate |
float | None
|
Dropout rate. |
None
|
name |
str | None
|
Name of the layer. |
None
|
**kwargs |
Any
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/BoostingEnsembleLayer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
π§ Parameters Deep Dive
num_learners (int)
- Purpose: Number of boosting blocks in the ensemble
- Range: 2 to 20+ (typically 3-8)
- Impact: More learners = more ensemble diversity but more parameters
- Recommendation: Start with 3-5, scale based on data complexity
learner_units (int or list)
- Purpose: Number of hidden units in each boosting block
- Range: 16 to 256+ (typically 32-128)
- Impact: Larger values = more complex individual learners
- Recommendation: Start with 64, scale based on data complexity
hidden_activation (str)
- Purpose: Activation function for hidden layers in boosting blocks
- Options: 'relu', 'selu', 'tanh', 'sigmoid', etc.
- Default: 'relu'
- Impact: Affects individual learner behavior
- Recommendation: Use 'relu' for most cases, 'selu' for deeper networks
dropout_rate (float, optional)
- Purpose: Dropout rate for regularization in boosting blocks
- Range: 0.0 to 0.5 (typically 0.1-0.2)
- Impact: Higher values = more regularization
- Recommendation: Use 0.1-0.2 for regularization
π Performance Characteristics
- Speed: β‘β‘β‘ Fast for small to medium ensembles, scales with learners
- Memory: πΎπΎπΎ Moderate memory usage due to multiple learners
- Accuracy: π―π―π―π― Excellent for ensemble learning
- Best For: Tabular data where ensemble methods are effective
π¨ Examples
Example 1: Ensemble Learning
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
Example 2: Ensemble Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
Example 3: Ensemble Comparison
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | |
π‘ Tips & Best Practices
- Number of Learners: Start with 3-5 learners, scale based on data complexity
- Learner Units: Use 32-64 units per learner for most applications
- Activation Functions: Use 'relu' for most cases, 'selu' for deeper networks
- Dropout: Use 0.1-0.2 dropout rate for regularization
- Ensemble Diversity: Different learners will specialize in different patterns
- Weight Learning: The layer automatically learns optimal combination weights
β οΈ Common Pitfalls
- Number of Learners: Must be positive integer
- Learner Units: Must be positive integer or list of positive integers
- Memory Usage: Scales with number of learners and units
- Overfitting: Can overfit with too many learners on small datasets
- Learner Utilization: Some learners may not be used effectively
π Related Layers
- BoostingBlock - Individual boosting block
- SparseAttentionWeighting - Sparse attention weighting
- TabularMoELayer - Mixture of experts
- VariableSelection - Variable selection
π Further Reading
- Ensemble Learning - Ensemble learning concepts
- Boosting Methods - Boosting techniques
- Parallel Processing - Parallel processing concepts
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering