🎯 BoostingEnsembleLayer

🔴 Advanced ✅ Stable 🔥 Popular

🎯 Overview

The BoostingEnsembleLayer aggregates multiple BoostingBlocks in parallel, combining their outputs via learnable weights to form an ensemble prediction. This layer implements ensemble learning in a differentiable, end-to-end manner, allowing multiple weak learners to work together.

This layer is particularly powerful for tabular data where ensemble methods are effective, providing a neural network implementation of boosting ensemble techniques.

🔍 How It Works

The BoostingEnsembleLayer processes data through parallel boosting blocks:

Parallel Processing: Creates multiple boosting blocks that process input independently
Correction Computation: Each block computes its own correction term
Gating Mechanism: Learns weights for combining block outputs
Weighted Aggregation: Combines block outputs using learned weights
Output Generation: Produces ensemble prediction

graph TD
    A[Input Features] --> B1[Boosting Block 1]
    A --> B2[Boosting Block 2]
    A --> B3[Boosting Block N]

    B1 --> C1[Correction 1]
    B2 --> C2[Correction 2]
    B3 --> C3[Correction N]

    C1 --> D[Gating Mechanism]
    C2 --> D
    C3 --> D

    D --> E[Learnable Weights]
    E --> F[Weighted Aggregation]
    F --> G[Ensemble Output]

    style A fill:#e6f3ff,stroke:#4a86e8
    style G fill:#e8f5e9,stroke:#66bb6a
    style B1 fill:#fff9e6,stroke:#ffb74d
    style B2 fill:#fff9e6,stroke:#ffb74d
    style B3 fill:#fff9e6,stroke:#ffb74d
    style D fill:#f3e5f5,stroke:#9c27b0
    style F fill:#e1f5fe,stroke:#03a9f4

💡 Why Use This Layer?

Challenge	Traditional Approach	BoostingEnsembleLayer's Solution
Ensemble Learning	Separate ensemble models	🎯 Integrated ensemble in neural networks
Parallel Processing	Sequential boosting	⚡ Parallel boosting blocks
Weight Learning	Fixed ensemble weights	🧠 Learnable weights for optimal combination
End-to-End Learning	Separate training phases	🔗 End-to-end ensemble learning

📊 Use Cases

Ensemble Learning: Building ensemble models in neural networks
Parallel Boosting: Implementing parallel boosting techniques
Weak Learner Combination: Combining multiple weak learners
Tabular Data: Effective for tabular data ensemble methods
Robust Predictions: Creating more robust predictions through ensemble

🚀 Quick Start

Basic Usage

import keras
from kerasfactory.layers import BoostingEnsembleLayer

# Create sample input data
batch_size, input_dim = 32, 16
x = keras.random.normal((batch_size, input_dim))

# Apply boosting ensemble
ensemble = BoostingEnsembleLayer(num_learners=3, learner_units=64)
output = ensemble(x)

print(f"Input shape: {x.shape}")           # (32, 16)
print(f"Output shape: {output.shape}")     # (32, 16)

In a Sequential Model

import keras
from kerasfactory.layers import BoostingEnsembleLayer

model = keras.Sequential([
    keras.layers.Dense(32, activation='relu'),
    BoostingEnsembleLayer(num_learners=3, learner_units=64),
    keras.layers.Dense(16, activation='relu'),
    BoostingEnsembleLayer(num_learners=2, learner_units=32),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In a Functional Model

import keras
from kerasfactory.layers import BoostingEnsembleLayer

# Define inputs
inputs = keras.Input(shape=(20,))  # 20 features

# Apply boosting ensemble
x = BoostingEnsembleLayer(num_learners=4, learner_units=64)(inputs)

# Continue processing
x = keras.layers.Dense(32, activation='relu')(x)
x = BoostingEnsembleLayer(num_learners=2, learner_units=32)(x)
x = keras.layers.Dense(16, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

Advanced Configuration

# Advanced configuration with multiple ensemble layers
def create_ensemble_network():
    inputs = keras.Input(shape=(30,))

    # Multiple ensemble layers with different configurations
    x = BoostingEnsembleLayer(
        num_learners=5,
        learner_units=[64, 32],  # Two hidden layers in each learner
        hidden_activation='selu',
        dropout_rate=0.1
    )(inputs)

    x = keras.layers.Dense(48, activation='relu')(x)
    x = keras.layers.BatchNormalization()(x)

    x = BoostingEnsembleLayer(
        num_learners=3,
        learner_units=32,
        hidden_activation='relu',
        dropout_rate=0.1
    )(x)

    x = keras.layers.Dense(24, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)

    # Multi-task output
    classification = keras.layers.Dense(3, activation='softmax', name='classification')(x)
    regression = keras.layers.Dense(1, name='regression')(x)

    return keras.Model(inputs, [classification, regression])

model = create_ensemble_network()
model.compile(
    optimizer='adam',
    loss={'classification': 'categorical_crossentropy', 'regression': 'mse'},
    loss_weights={'classification': 1.0, 'regression': 0.5}
)

📖 API Reference

kerasfactory.layers.BoostingEnsembleLayer

This module implements a BoostingEnsembleLayer that aggregates multiple BoostingBlocks in parallel. Their outputs are combined via learnable weights to form an ensemble prediction. This is similar in spirit to boosting ensembles but implemented in a differentiable, end-to-end manner.

Classes

BoostingEnsembleLayer

BoostingEnsembleLayer(
    num_learners: int = 3,
    learner_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any
)

Ensemble layer of boosting blocks for tabular data.

This layer aggregates multiple boosting blocks (weak learners) in parallel. Each learner produces a correction to the input. A gating mechanism (via learnable weights) then computes a weighted sum of the learners' outputs.

Parameters:

Name	Type	Description	Default
`num_learners`	`int`	Number of boosting blocks in the ensemble. Default is 3.	`3`
`learner_units`	`int \| list[int]`	Number of hidden units in each boosting block. Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64.	`64`
`hidden_activation`	`str`	Activation function for hidden layers in boosting blocks. Default is 'relu'.	`'relu'`
`output_activation`	`str \| None`	Activation function for the output layer in boosting blocks. Default is None.	`None`
`gamma_trainable`	`bool`	Whether the scaling factor gamma in boosting blocks is trainable. Default is True.	`True`
`dropout_rate`	`float \| None`	Optional dropout rate to apply in boosting blocks. Default is None.	`None`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

N-D tensor with shape: (batch_size, ..., input_dim)

Output shape

Same shape as input: (batch_size, ..., input_dim)

Example

import keras
from kerasfactory.layers import BoostingEnsembleLayer

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Basic usage
ensemble = BoostingEnsembleLayer(num_learners=3, learner_units=64)
y = ensemble(x)
print("Ensemble output shape:", y.shape)  # (32, 16)

# Advanced configuration
ensemble = BoostingEnsembleLayer(
    num_learners=5,
    learner_units=[32, 16],  # Two hidden layers in each learner
    hidden_activation='selu',
    dropout_rate=0.1
)
y = ensemble(x)

Initialize the BoostingEnsembleLayer.

Parameters:

Name	Type	Description	Default
`num_learners`	`int`	Number of boosting learners.	`3`
`learner_units`	`int \| list[int]`	Number of units per learner or list of units.	`64`
`hidden_activation`	`str`	Activation function for hidden layers.	`'relu'`
`output_activation`	`str \| None`	Activation function for output layer.	`None`
`gamma_trainable`	`bool`	Whether gamma parameter is trainable.	`True`
`dropout_rate`	`float \| None`	Dropout rate.	`None`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Source code in kerasfactory/layers/BoostingEnsembleLayer.py

def __init__(
    self,
    num_learners: int = 3,
    learner_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the BoostingEnsembleLayer.

    Args:
        num_learners: Number of boosting learners.
        learner_units: Number of units per learner or list of units.
        hidden_activation: Activation function for hidden layers.
        output_activation: Activation function for output layer.
        gamma_trainable: Whether gamma parameter is trainable.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes before calling parent's __init__
    self._num_learners = num_learners
    self._learner_units = learner_units
    self._hidden_activation = hidden_activation
    self._output_activation = output_activation
    self._gamma_trainable = gamma_trainable
    self._dropout_rate = dropout_rate

    # Validate parameters
    if num_learners <= 0:
        raise ValueError(f"num_learners must be positive, got {num_learners}")
    if dropout_rate is not None and not 0 <= dropout_rate < 1:
        raise ValueError("dropout_rate must be between 0 and 1")

    # Set public attributes before calling parent's __init__
    self.num_learners = self._num_learners
    self.learner_units = self._learner_units
    self.hidden_activation = self._hidden_activation
    self.output_activation = self._output_activation
    self.gamma_trainable = self._gamma_trainable
    self.dropout_rate = self._dropout_rate
    self.learners: list[BoostingBlock] | None = None
    self.alpha: layers.Variable | None = None

    super().__init__(name=name, **kwargs)

🔧 Parameters Deep Dive

`num_learners` (int)

Purpose: Number of boosting blocks in the ensemble
Range: 2 to 20+ (typically 3-8)
Impact: More learners = more ensemble diversity but more parameters
Recommendation: Start with 3-5, scale based on data complexity

`learner_units` (int or list)

Purpose: Number of hidden units in each boosting block
Range: 16 to 256+ (typically 32-128)
Impact: Larger values = more complex individual learners
Recommendation: Start with 64, scale based on data complexity

`hidden_activation` (str)

Purpose: Activation function for hidden layers in boosting blocks
Options: 'relu', 'selu', 'tanh', 'sigmoid', etc.
Default: 'relu'
Impact: Affects individual learner behavior
Recommendation: Use 'relu' for most cases, 'selu' for deeper networks

`dropout_rate` (float, optional)

Purpose: Dropout rate for regularization in boosting blocks
Range: 0.0 to 0.5 (typically 0.1-0.2)
Impact: Higher values = more regularization
Recommendation: Use 0.1-0.2 for regularization

📈 Performance Characteristics

Speed: ⚡⚡⚡ Fast for small to medium ensembles, scales with learners
Memory: 💾💾💾 Moderate memory usage due to multiple learners
Accuracy: 🎯🎯🎯🎯 Excellent for ensemble learning
Best For: Tabular data where ensemble methods are effective

🎨 Examples

Example 1: Ensemble Learning

import keras
import numpy as np
from kerasfactory.layers import BoostingEnsembleLayer

# Create an ensemble learning model
def create_ensemble_learning_model():
    inputs = keras.Input(shape=(25,))

    # Multiple ensemble layers
    x = BoostingEnsembleLayer(
        num_learners=6,
        learner_units=64,
        hidden_activation='relu',
        dropout_rate=0.1
    )(inputs)

    x = keras.layers.Dense(48, activation='relu')(x)
    x = keras.layers.BatchNormalization()(x)

    x = BoostingEnsembleLayer(
        num_learners=4,
        learner_units=32,
        hidden_activation='relu',
        dropout_rate=0.1
    )(x)

    x = keras.layers.Dense(24, activation='relu')(x)
    x = keras.layers.Dropout(0.2)(x)

    # Output
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    return keras.Model(inputs, outputs)

model = create_ensemble_learning_model()
model.compile(optimizer='adam', loss='binary_crossentropy')

# Test with sample data
sample_data = keras.random.normal((100, 25))
predictions = model(sample_data)
print(f"Ensemble learning predictions shape: {predictions.shape}")

Example 2: Ensemble Analysis

# Analyze ensemble behavior
def analyze_ensemble_behavior():
    # Create model with ensemble
    inputs = keras.Input(shape=(15,))
    x = BoostingEnsembleLayer(num_learners=4, learner_units=32)(inputs)
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    model = keras.Model(inputs, outputs)

    # Test with different input patterns
    test_inputs = [
        keras.random.normal((10, 15)),  # Random data
        keras.random.normal((10, 15)) * 2,  # Scaled data
        keras.random.normal((10, 15)) + 1,  # Shifted data
    ]

    print("Ensemble Behavior Analysis:")
    print("=" * 40)

    for i, test_input in enumerate(test_inputs):
        prediction = model(test_input)
        print(f"Test {i+1}: Prediction mean = {keras.ops.mean(prediction):.4f}")

    return model

# Analyze ensemble behavior
# model = analyze_ensemble_behavior()

Example 3: Ensemble Comparison

# Compare different ensemble configurations
def compare_ensemble_configurations():
    inputs = keras.Input(shape=(20,))

    # Configuration 1: Few learners, large units
    x1 = BoostingEnsembleLayer(num_learners=3, learner_units=64)(inputs)
    x1 = keras.layers.Dense(1, activation='sigmoid')(x1)
    model1 = keras.Model(inputs, x1)

    # Configuration 2: Many learners, small units
    x2 = BoostingEnsembleLayer(num_learners=8, learner_units=32)(inputs)
    x2 = keras.layers.Dense(1, activation='sigmoid')(x2)
    model2 = keras.Model(inputs, x2)

    # Configuration 3: Balanced configuration
    x3 = BoostingEnsembleLayer(num_learners=5, learner_units=48)(inputs)
    x3 = keras.layers.Dense(1, activation='sigmoid')(x3)
    model3 = keras.Model(inputs, x3)

    # Test with sample data
    test_data = keras.random.normal((50, 20))

    print("Ensemble Configuration Comparison:")
    print("=" * 50)
    print(f"Few learners, large units: {model1.count_params()} parameters")
    print(f"Many learners, small units: {model2.count_params()} parameters")
    print(f"Balanced configuration: {model3.count_params()} parameters")

    return model1, model2, model3

# Compare configurations
# models = compare_ensemble_configurations()

💡 Tips & Best Practices

Number of Learners: Start with 3-5 learners, scale based on data complexity
Learner Units: Use 32-64 units per learner for most applications
Activation Functions: Use 'relu' for most cases, 'selu' for deeper networks
Dropout: Use 0.1-0.2 dropout rate for regularization
Ensemble Diversity: Different learners will specialize in different patterns
Weight Learning: The layer automatically learns optimal combination weights

⚠️ Common Pitfalls

Number of Learners: Must be positive integer
Learner Units: Must be positive integer or list of positive integers
Memory Usage: Scales with number of learners and units
Overfitting: Can overfit with too many learners on small datasets
Learner Utilization: Some learners may not be used effectively

BoostingBlock - Individual boosting block
SparseAttentionWeighting - Sparse attention weighting
TabularMoELayer - Mixture of experts
VariableSelection - Variable selection

📚 Further Reading

Ensemble Learning - Ensemble learning concepts
Boosting Methods - Boosting techniques
Parallel Processing - Parallel processing concepts
KerasFactory Layer Explorer - Browse all available layers
Feature Engineering Tutorial - Complete guide to feature engineering

🎯 BoostingEnsembleLayer

🎯 BoostingEnsembleLayer

🎯 Overview

🔍 How It Works

💡 Why Use This Layer?

📊 Use Cases

🚀 Quick Start

Basic Usage

In a Sequential Model

In a Functional Model

Advanced Configuration

📖 API Reference

kerasfactory.layers.BoostingEnsembleLayer

Classes

BoostingEnsembleLayer

🔧 Parameters Deep Dive

num_learners (int)

learner_units (int or list)

hidden_activation (str)

dropout_rate (float, optional)

📈 Performance Characteristics

🎨 Examples

Example 1: Ensemble Learning

Example 2: Ensemble Analysis

Example 3: Ensemble Comparison

💡 Tips & Best Practices

⚠️ Common Pitfalls

🔗 Related Layers

📚 Further Reading

`num_learners` (int)

`learner_units` (int or list)

`hidden_activation` (str)

`dropout_rate` (float, optional)