Skip to content

🧩 Layers API Reference

Welcome to the KerasFactory Layers documentation! All layers are designed to work exclusively with Keras 3 and provide specialized implementations for advanced tabular data processing, feature engineering, attention mechanisms, and time series forecasting.

What You'll Find Here

Each layer includes detailed documentation with: - ✨ Complete parameter descriptions with types and defaults - 🎯 Usage examples showing real-world applications - ⚡ Best practices and performance considerations - 🎨 When to use guidance for each layer - 🔧 Implementation notes for developers

Modular & Composable

These layers can be combined together to create complex neural network architectures tailored to your specific needs.

Keras 3 Compatible

All layers are built on top of Keras base classes and are fully compatible with Keras 3.

⏱️ Time Series & Forecasting

📍 PositionalEmbedding

Fixed sinusoidal positional encoding for transformers and sequence models.

kerasfactory.layers.PositionalEmbedding

Positional Embedding layer for transformer-based models.

Classes

PositionalEmbedding
1
2
3
4
5
6
PositionalEmbedding(
    d_model: int,
    max_len: int = 5000,
    name: str | None = None,
    **kwargs: Any
)

Sinusoidal positional encoding layer.

Generates fixed positional encodings using sine and cosine functions with different frequencies. These are added to input embeddings to provide positional information to the model.

Parameters:

Name Type Description Default
d_model int

Dimension of the positional embeddings.

required
max_len int

Maximum length of sequences (default: 5000).

5000
name str | None

Optional name for the layer.

None
Input shape

(batch_size, seq_len, ...)

Output shape

(1, seq_len, d_model)

Example
1
2
3
4
5
6
7
import keras
from kerasfactory.layers import PositionalEmbedding

# Create positional embeddings
pos_emb = PositionalEmbedding(d_model=64, max_len=512)
positions = pos_emb(keras.random.normal((32, 100, 64)))
print("Positional embeddings shape:", positions.shape)  # (1, 100, 64)

Initialize the PositionalEmbedding layer.

Parameters:

Name Type Description Default
d_model int

Dimension of positional embeddings.

required
max_len int

Maximum sequence length.

5000
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/PositionalEmbedding.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def __init__(
    self,
    d_model: int,
    max_len: int = 5000,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the PositionalEmbedding layer.

    Args:
        d_model: Dimension of positional embeddings.
        max_len: Maximum sequence length.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._d_model = d_model
    self._max_len = max_len

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.d_model = self._d_model
    self.max_len = self._max_len
    self.pe: KerasTensor | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔧 FixedEmbedding

Non-trainable sinusoidal embeddings for discrete indices (months, days, hours, etc.).

kerasfactory.layers.FixedEmbedding

Fixed Embedding layer for temporal position encoding.

Classes

FixedEmbedding
1
2
3
4
5
6
FixedEmbedding(
    n_features: int,
    d_model: int,
    name: str | None = None,
    **kwargs: Any
)

Fixed sinusoidal embedding layer.

Provides fixed (non-trainable) sinusoidal embeddings for discrete indices, commonly used for encoding temporal features or positions.

Parameters:

Name Type Description Default
n_features int

Number of features/vocabulary size.

required
d_model int

Dimension of the embedding vectors.

required
name str | None

Optional name for the layer.

None
Input shape

(batch_size, seq_len) - integer indices

Output shape

(batch_size, seq_len, d_model)

Example
1
2
3
4
5
6
7
8
import keras
from kerasfactory.layers import FixedEmbedding

# Create fixed embedding
emb = FixedEmbedding(n_features=32, d_model=64)
indices = keras.random.uniform((16, 100), minval=0, maxval=32, dtype='int32')
embeddings = emb(indices)
print("Embeddings shape:", embeddings.shape)  # (16, 100, 64)

Initialize the FixedEmbedding layer.

Parameters:

Name Type Description Default
n_features int

Number of discrete features/positions.

required
d_model int

Dimension of embedding vectors.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/FixedEmbedding.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def __init__(
    self,
    n_features: int,
    d_model: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the FixedEmbedding layer.

    Args:
        n_features: Number of discrete features/positions.
        d_model: Dimension of embedding vectors.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._n_features = n_features
    self._d_model = d_model

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.n_features = self._n_features
    self.d_model = self._d_model
    self.embedding_layer: layers.Embedding | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🎫 TokenEmbedding

1D convolution-based embedding layer for time series values.

kerasfactory.layers.TokenEmbedding

Token Embedding layer for time series using 1D convolution.

Classes

TokenEmbedding
1
2
3
4
5
6
TokenEmbedding(
    c_in: int,
    d_model: int,
    name: str | None = None,
    **kwargs: Any
)

Embeds time series values using 1D convolution.

Uses a conv1d layer with circular padding to create embeddings from raw values. Kaiming normal initialization is applied for proper training dynamics.

Parameters:

Name Type Description Default
c_in int

Number of input channels.

required
d_model int

Dimension of output embeddings.

required
name str | None

Optional name for the layer.

None
Input shape

(batch_size, time_steps, channels)

Output shape

(batch_size, time_steps, d_model)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import TokenEmbedding

# Create token embedding
token_emb = TokenEmbedding(c_in=1, d_model=64)

# Apply to time series
x = keras.random.normal((32, 100, 1))
embeddings = token_emb(x)
print("Embeddings shape:", embeddings.shape)  # (32, 100, 64)

Initialize the TokenEmbedding layer.

Parameters:

Name Type Description Default
c_in int

Number of input channels.

required
d_model int

Dimension of output embeddings.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/TokenEmbedding.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(
    self,
    c_in: int,
    d_model: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the TokenEmbedding layer.

    Args:
        c_in: Number of input channels.
        d_model: Dimension of output embeddings.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._c_in = c_in
    self._d_model = d_model

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.c_in = self._c_in
    self.d_model = self._d_model
    self.conv: layers.Conv1D | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

⏰ TemporalEmbedding

Embedding layer for temporal features (month, day, weekday, hour, minute).

kerasfactory.layers.TemporalEmbedding

Temporal Embedding layer for time feature encoding.

Classes

TemporalEmbedding
1
2
3
4
5
6
7
TemporalEmbedding(
    d_model: int,
    embed_type: str = "fixed",
    freq: str = "h",
    name: str | None = None,
    **kwargs: Any
)

Embeds temporal features (month, day, weekday, hour, minute).

Creates embeddings for calendar features to capture temporal patterns. Supports both fixed and trainable embedding modes.

Parameters:

Name Type Description Default
d_model int

Dimension of embeddings.

required
embed_type str

Type of embedding - 'fixed' or 'learned' (default: 'fixed').

'fixed'
freq str

Frequency - 't' (minute level) or 'h' (hour level) (default: 'h').

'h'
name str | None

Optional name for the layer.

None
Input shape

(batch_size, seq_len, 5) - with encoded [month, day, weekday, hour, minute]

Output shape

(batch_size, seq_len, d_model)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import TemporalEmbedding

# Create temporal embedding
temp_emb = TemporalEmbedding(d_model=64)

# Apply to temporal features
x = keras.random.uniform((32, 100, 5), minval=0, maxval=13, dtype='int32')
embeddings = temp_emb(x)
print("Embeddings shape:", embeddings.shape)  # (32, 100, 64)

Initialize the TemporalEmbedding layer.

Parameters:

Name Type Description Default
d_model int

Dimension of embeddings.

required
embed_type str

Type of embedding ('fixed' or 'learned').

'fixed'
freq str

Frequency ('t' or 'h').

'h'
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/TemporalEmbedding.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def __init__(
    self,
    d_model: int,
    embed_type: str = "fixed",
    freq: str = "h",
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the TemporalEmbedding layer.

    Args:
        d_model: Dimension of embeddings.
        embed_type: Type of embedding ('fixed' or 'learned').
        freq: Frequency ('t' or 'h').
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._d_model = d_model
    self._embed_type = embed_type
    self._freq = freq

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.d_model = self._d_model
    self.embed_type = self._embed_type
    self.freq = self._freq

    # Embedding layers
    self.minute_embed: FixedEmbedding | layers.Embedding | None = None
    self.hour_embed: FixedEmbedding | layers.Embedding | None = None
    self.weekday_embed: FixedEmbedding | layers.Embedding | None = None
    self.day_embed: FixedEmbedding | layers.Embedding | None = None
    self.month_embed: FixedEmbedding | layers.Embedding | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🎯 DataEmbeddingWithoutPosition

Combined token and temporal embedding layer for comprehensive feature representation.

kerasfactory.layers.DataEmbeddingWithoutPosition

Data Embedding layer combining value and temporal embeddings.

Classes

DataEmbeddingWithoutPosition
1
2
3
4
5
6
7
8
9
DataEmbeddingWithoutPosition(
    c_in: int,
    d_model: int,
    embed_type: str = "fixed",
    freq: str = "h",
    dropout: float = 0.1,
    name: str | None = None,
    **kwargs: Any
)

Combines token (value) and temporal embeddings.

Embeds time series values using token embedding and optionally adds temporal features. Applies dropout after combining embeddings.

Parameters:

Name Type Description Default
c_in int

Number of input channels.

required
d_model int

Dimension of embeddings.

required
embed_type str

Type of temporal embedding ('fixed' or 'learned').

'fixed'
freq str

Frequency for temporal features ('t' or 'h').

'h'
dropout float

Dropout rate (default: 0.1).

0.1
name str | None

Optional name for the layer.

None
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import DataEmbeddingWithoutPosition

# Create data embedding
data_emb = DataEmbeddingWithoutPosition(c_in=1, d_model=64)

# Apply to time series values
x = keras.random.normal((32, 100, 1))
x_mark = keras.random.uniform((32, 100, 5), minval=0, maxval=13, dtype='int32')

embeddings = data_emb([x, x_mark])
print("Embeddings shape:", embeddings.shape)  # (32, 100, 64)

Initialize the DataEmbeddingWithoutPosition layer.

Parameters:

Name Type Description Default
c_in int

Number of input channels.

required
d_model int

Dimension of embeddings.

required
embed_type str

Type of temporal embedding.

'fixed'
freq str

Frequency for temporal embedding.

'h'
dropout float

Dropout rate.

0.1
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DataEmbeddingWithoutPosition.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def __init__(
    self,
    c_in: int,
    d_model: int,
    embed_type: str = "fixed",
    freq: str = "h",
    dropout: float = 0.1,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DataEmbeddingWithoutPosition layer.

    Args:
        c_in: Number of input channels.
        d_model: Dimension of embeddings.
        embed_type: Type of temporal embedding.
        freq: Frequency for temporal embedding.
        dropout: Dropout rate.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._c_in = c_in
    self._d_model = d_model
    self._embed_type = embed_type
    self._freq = freq
    self._dropout = dropout

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.c_in = self._c_in
    self.d_model = self._d_model
    self.embed_type = self._embed_type
    self.freq = self._freq
    self.dropout_rate = self._dropout

    # Embedding layers
    self.value_embedding: TokenEmbedding | None = None
    self.temporal_embedding: TemporalEmbedding | None = None
    self.dropout: layers.Dropout | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🏃 MovingAverage

Trend extraction layer using moving average filtering for time series.

kerasfactory.layers.MovingAverage

Moving Average layer for time series trend extraction.

Classes

MovingAverage
1
2
3
MovingAverage(
    kernel_size: int, name: str | None = None, **kwargs: Any
)

Extracts the trend component using moving average.

This layer computes a moving average over time series to extract the trend component. It applies padding at both ends to maintain the temporal dimension.

Parameters:

Name Type Description Default
kernel_size int

Size of the moving average window.

required
name str | None

Optional name for the layer.

None
Input shape

(batch_size, time_steps, channels)

Output shape

(batch_size, time_steps, channels)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import MovingAverage

# Create sample time series data
x = keras.random.normal((32, 100, 8))  # 32 samples, 100 time steps, 8 features

# Apply moving average
moving_avg = MovingAverage(kernel_size=25)
trend = moving_avg(x)
print("Trend shape:", trend.shape)  # (32, 100, 8)

Initialize the MovingAverage layer.

Parameters:

Name Type Description Default
kernel_size int

Size of the moving average kernel.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/MovingAverage.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
def __init__(
    self,
    kernel_size: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the MovingAverage layer.

    Args:
        kernel_size: Size of the moving average kernel.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._kernel_size = kernel_size

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.kernel_size = self._kernel_size

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔀 SeriesDecomposition

Trend-seasonal decomposition using moving average.

kerasfactory.layers.SeriesDecomposition

Series Decomposition layer for time series trend-seasonal separation.

Classes

SeriesDecomposition
1
2
3
SeriesDecomposition(
    kernel_size: int, name: str | None = None, **kwargs: Any
)

Decomposes time series into trend and seasonal components.

Uses moving average to extract the trend component, then computes seasonal as the residual (input - trend).

Parameters:

Name Type Description Default
kernel_size int

Size of the moving average window.

required
name str | None

Optional name for the layer.

None
Input shape

(batch_size, time_steps, channels)

Output shape
  • seasonal: (batch_size, time_steps, channels)
  • trend: (batch_size, time_steps, channels)
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import SeriesDecomposition

# Create sample time series
x = keras.random.normal((32, 100, 8))

# Decompose into trend and seasonal
decomp = SeriesDecomposition(kernel_size=25)
seasonal, trend = decomp(x)

print(f"Seasonal shape: {seasonal.shape}")  # (32, 100, 8)
print(f"Trend shape: {trend.shape}")        # (32, 100, 8)

Initialize the SeriesDecomposition layer.

Parameters:

Name Type Description Default
kernel_size int

Size of the moving average window.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/SeriesDecomposition.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def __init__(
    self,
    kernel_size: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the SeriesDecomposition layer.

    Args:
        kernel_size: Size of the moving average window.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._kernel_size = kernel_size

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.kernel_size = self._kernel_size
    self.moving_avg: MovingAverage | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

📊 DFTSeriesDecomposition

Frequency-based series decomposition using Discrete Fourier Transform.

kerasfactory.layers.DFTSeriesDecomposition

DFT-based Series Decomposition layer using frequency domain analysis.

Classes

DFTSeriesDecomposition
1
2
3
DFTSeriesDecomposition(
    top_k: int, name: str | None = None, **kwargs: Any
)

Decomposes time series using DFT (Discrete Fourier Transform).

Extracts seasonal components by selecting top-k frequencies in the frequency domain, then computes trend as the residual. This method captures periodic patterns more explicitly than moving average.

Parameters:

Name Type Description Default
top_k int

Number of top frequencies to keep as seasonal component.

required
name str | None

Optional name for the layer.

None
Input shape

(batch_size, time_steps, channels)

Output shape
  • seasonal: (batch_size, time_steps, channels)
  • trend: (batch_size, time_steps, channels)
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import DFTSeriesDecomposition

# Create sample time series
x = keras.random.normal((32, 100, 8))

# Decompose using DFT
decomp = DFTSeriesDecomposition(top_k=5)
seasonal, trend = decomp(x)

print(f"Seasonal shape: {seasonal.shape}")  # (32, 100, 8)
print(f"Trend shape: {trend.shape}")        # (32, 100, 8)

Initialize the DFTSeriesDecomposition layer.

Parameters:

Name Type Description Default
top_k int

Number of top frequencies to keep.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DFTSeriesDecomposition.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def __init__(
    self,
    top_k: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DFTSeriesDecomposition layer.

    Args:
        top_k: Number of top frequencies to keep.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._top_k = top_k

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.top_k = self._top_k

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔄 ReversibleInstanceNorm

Reversible instance normalization with optional denormalization for time series.

kerasfactory.layers.ReversibleInstanceNorm

Reversible Instance Normalization layer for time series.

Classes

ReversibleInstanceNorm
1
2
3
4
5
6
7
8
9
ReversibleInstanceNorm(
    num_features: int,
    eps: float = 1e-05,
    affine: bool = False,
    subtract_last: bool = False,
    non_norm: bool = False,
    name: str | None = None,
    **kwargs: Any
)

Reversible Instance Normalization (RevIN) for time series.

Normalizes each series independently and enables reversible denormalization. This is useful for improving model performance by removing distributional shifts.

Parameters:

Name Type Description Default
num_features int

Number of features/channels.

required
eps float

Small value for numerical stability (default: 1e-5).

1e-05
affine bool

Whether to use learnable scale and shift (default: False).

False
subtract_last bool

If True, normalize by last value instead of mean (default: False).

False
non_norm bool

If True, no normalization is applied (default: False).

False
name str | None

Optional name for the layer.

None
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import ReversibleInstanceNorm

# Create normalization layer
revin = ReversibleInstanceNorm(num_features=8)

# Normalize
x = keras.random.normal((32, 100, 8))
x_norm = revin(x, training=True)

# Denormalize
x_denorm = revin(x_norm, mode='denorm')

Initialize the ReversibleInstanceNorm layer.

Parameters:

Name Type Description Default
num_features int

Number of features.

required
eps float

Epsilon for numerical stability.

1e-05
affine bool

Whether to use learnable affine transformation.

False
subtract_last bool

Whether to normalize by last value.

False
non_norm bool

Whether to skip normalization.

False
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/ReversibleInstanceNorm.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def __init__(
    self,
    num_features: int,
    eps: float = 1e-5,
    affine: bool = False,
    subtract_last: bool = False,
    non_norm: bool = False,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the ReversibleInstanceNorm layer.

    Args:
        num_features: Number of features.
        eps: Epsilon for numerical stability.
        affine: Whether to use learnable affine transformation.
        subtract_last: Whether to normalize by last value.
        non_norm: Whether to skip normalization.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._num_features = num_features
    self._eps = eps
    self._affine = affine
    self._subtract_last = subtract_last
    self._non_norm = non_norm

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.num_features = self._num_features
    self.eps = self._eps
    self.affine = self._affine
    self.subtract_last = self._subtract_last
    self.non_norm = self._non_norm

    # Learnable parameters
    self.affine_weight = None
    self.affine_bias = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🏗️ ReversibleInstanceNormMultivariate

Multivariate version of reversible instance normalization.

kerasfactory.layers.ReversibleInstanceNormMultivariate

Multivariate Reversible Instance Normalization layer.

Classes

ReversibleInstanceNormMultivariate
1
2
3
4
5
6
7
ReversibleInstanceNormMultivariate(
    num_features: int,
    eps: float = 1e-05,
    affine: bool = False,
    name: str | None = None,
    **kwargs: Any
)

Reversible Instance Normalization for multivariate time series.

Normalizes each series independently across the time dimension, enabling reversible denormalization. Designed for multivariate data.

Parameters:

Name Type Description Default
num_features int

Number of features/channels.

required
eps float

Small value for numerical stability (default: 1e-5).

1e-05
affine bool

Whether to use learnable scale and shift (default: False).

False
name str | None

Optional name for the layer.

None
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import ReversibleInstanceNormMultivariate

# Create normalization layer
revin = ReversibleInstanceNormMultivariate(num_features=8)

# Normalize
x = keras.random.normal((32, 100, 8))
x_norm = revin(x)

# Denormalize
x_denorm = revin(x_norm, mode='denorm')

Initialize the ReversibleInstanceNormMultivariate layer.

Parameters:

Name Type Description Default
num_features int

Number of features.

required
eps float

Epsilon for numerical stability.

1e-05
affine bool

Whether to use learnable affine transformation.

False
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/ReversibleInstanceNormMultivariate.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def __init__(
    self,
    num_features: int,
    eps: float = 1e-5,
    affine: bool = False,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the ReversibleInstanceNormMultivariate layer.

    Args:
        num_features: Number of features.
        eps: Epsilon for numerical stability.
        affine: Whether to use learnable affine transformation.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._num_features = num_features
    self._eps = eps
    self._affine = affine

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.num_features = self._num_features
    self.eps = self._eps
    self.affine = self._affine

    # State for normalization
    self.batch_mean = None
    self.batch_std = None

    # Learnable parameters
    self.affine_weight = None
    self.affine_bias = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🌊 MultiScaleSeasonMixing

Bottom-up multi-scale seasonal pattern mixing.

kerasfactory.layers.MultiScaleSeasonMixing

Multi-Scale Season Mixing layer for hierarchical seasonal pattern mixing.

Classes

MultiScaleSeasonMixing
1
2
3
4
5
6
7
MultiScaleSeasonMixing(
    seq_len: int,
    down_sampling_window: int,
    down_sampling_layers: int,
    name: str | None = None,
    **kwargs: Any
)

Mixes seasonal patterns across multiple scales bottom-up.

Processes seasonal components at different temporal resolutions, mixing information from coarse to fine scales.

Parameters:

Name Type Description Default
seq_len int

Input sequence length.

required
down_sampling_window int

Window size for downsampling.

required
down_sampling_layers int

Number of downsampling layers.

required
name str | None

Optional name for the layer.

None
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import MultiScaleSeasonMixing

# Create season mixing layer
season_mix = MultiScaleSeasonMixing(seq_len=100, down_sampling_window=2,
                                  down_sampling_layers=2)

# Process list of seasonal components at different scales
season_list = [keras.random.normal((32, 100, 8)),
               keras.random.normal((32, 50, 8))]
mixed_seasons = season_mix(season_list)
print("Number of outputs:", len(mixed_seasons))

Initialize the MultiScaleSeasonMixing layer.

Parameters:

Name Type Description Default
seq_len int

Sequence length.

required
down_sampling_window int

Downsampling window size.

required
down_sampling_layers int

Number of downsampling layers.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/MultiScaleSeasonMixing.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def __init__(
    self,
    seq_len: int,
    down_sampling_window: int,
    down_sampling_layers: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the MultiScaleSeasonMixing layer.

    Args:
        seq_len: Sequence length.
        down_sampling_window: Downsampling window size.
        down_sampling_layers: Number of downsampling layers.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._seq_len = seq_len
    self._down_sampling_window = down_sampling_window
    self._down_sampling_layers = down_sampling_layers

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.seq_len = self._seq_len
    self.down_sampling_window = self._down_sampling_window
    self.down_sampling_layers = self._down_sampling_layers
    self.down_sampling_layers_list: list[dict[str, layers.Layer]] | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

📈 MultiScaleTrendMixing

Top-down multi-scale trend pattern mixing.

kerasfactory.layers.MultiScaleTrendMixing

Multi-Scale Trend Mixing layer for hierarchical trend pattern mixing.

Classes

MultiScaleTrendMixing
1
2
3
4
5
6
7
MultiScaleTrendMixing(
    seq_len: int,
    down_sampling_window: int,
    down_sampling_layers: int,
    name: str | None = None,
    **kwargs: Any
)

Mixes trend patterns across multiple scales top-down.

Processes trend components at different temporal resolutions, mixing information from fine to coarse scales.

Parameters:

Name Type Description Default
seq_len int

Input sequence length.

required
down_sampling_window int

Window size for downsampling.

required
down_sampling_layers int

Number of downsampling layers.

required
name str | None

Optional name for the layer.

None
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import MultiScaleTrendMixing

# Create trend mixing layer
trend_mix = MultiScaleTrendMixing(seq_len=100, down_sampling_window=2,
                                 down_sampling_layers=2)

# Process list of trend components at different scales
trend_list = [keras.random.normal((32, 100, 8)),
              keras.random.normal((32, 50, 8))]
mixed_trends = trend_mix(trend_list)
print("Number of outputs:", len(mixed_trends))

Initialize the MultiScaleTrendMixing layer.

Parameters:

Name Type Description Default
seq_len int

Sequence length.

required
down_sampling_window int

Downsampling window size.

required
down_sampling_layers int

Number of downsampling layers.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/MultiScaleTrendMixing.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def __init__(
    self,
    seq_len: int,
    down_sampling_window: int,
    down_sampling_layers: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the MultiScaleTrendMixing layer.

    Args:
        seq_len: Sequence length.
        down_sampling_window: Downsampling window size.
        down_sampling_layers: Number of downsampling layers.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._seq_len = seq_len
    self._down_sampling_window = down_sampling_window
    self._down_sampling_layers = down_sampling_layers

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.seq_len = self._seq_len
    self.down_sampling_window = self._down_sampling_window
    self.down_sampling_layers = self._down_sampling_layers
    self.up_sampling_layers_list: list[dict[str, layers.Layer]] | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔀 PastDecomposableMixing

Past decomposable mixing encoder block combining decomposition and multi-scale mixing.

kerasfactory.layers.PastDecomposableMixing

Past Decomposable Mixing layer for time series encoder blocks.

Classes

PastDecomposableMixing
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
PastDecomposableMixing(
    seq_len: int,
    pred_len: int,
    down_sampling_window: int,
    down_sampling_layers: int,
    d_model: int,
    dropout: float,
    channel_independence: int,
    decomp_method: str,
    d_ff: int,
    moving_avg: int,
    top_k: int,
    name: str | None = None,
    **kwargs: Any
)

Past Decomposable Mixing block for TimeMixer encoder.

Decomposes time series, applies multi-scale mixing to trend and seasonal components, then reconstructs the signal.

Parameters:

Name Type Description Default
seq_len int

Sequence length.

required
pred_len int

Prediction length.

required
down_sampling_window int

Downsampling window size.

required
down_sampling_layers int

Number of downsampling layers.

required
d_model int

Model dimension.

required
dropout float

Dropout rate.

required
channel_independence int

Whether to use channel-independent processing.

required
decomp_method str

Decomposition method ('moving_avg' or 'dft_decomp').

required
d_ff int

Feed-forward dimension.

required
moving_avg int

Window size for moving average.

required
top_k int

Top-k frequencies for DFT.

required
name str | None

Optional name for the layer.

None
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import keras
from kerasfactory.layers import PastDecomposableMixing

# Create PDM block
pdm = PastDecomposableMixing(seq_len=100, pred_len=12,
                             down_sampling_window=2,
                             down_sampling_layers=1)

# Process multi-scale inputs
x_list = [keras.random.normal((32, 100, 8))]
output = pdm(x_list)

Initialize the PastDecomposableMixing layer.

Source code in kerasfactory/layers/PastDecomposableMixing.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def __init__(
    self,
    seq_len: int,
    pred_len: int,
    down_sampling_window: int,
    down_sampling_layers: int,
    d_model: int,
    dropout: float,
    channel_independence: int,
    decomp_method: str,
    d_ff: int,
    moving_avg: int,
    top_k: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the PastDecomposableMixing layer."""
    # Set private attributes
    self._seq_len = seq_len
    self._pred_len = pred_len
    self._down_sampling_window = down_sampling_window
    self._down_sampling_layers = down_sampling_layers
    self._d_model = d_model
    self._dropout = dropout
    self._channel_independence = channel_independence
    self._decomp_method = decomp_method
    self._d_ff = d_ff
    self._moving_avg = moving_avg
    self._top_k = top_k

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.seq_len = self._seq_len
    self.pred_len = self._pred_len
    self.down_sampling_window = self._down_sampling_window
    self.down_sampling_layers = self._down_sampling_layers
    self.d_model = self._d_model
    self.dropout_rate = self._dropout
    self.channel_independence = self._channel_independence
    self.decomp_method = self._decomp_method
    self.d_ff = self._d_ff
    self.moving_avg_kernel = self._moving_avg
    self.top_k = self._top_k

    # Components
    self.decomposition: SeriesDecomposition | DFTSeriesDecomposition | None = None
    self.season_mixing: MultiScaleSeasonMixing | None = None
    self.trend_mixing: MultiScaleTrendMixing | None = None
    self.dense1: layers.Dense | None = None
    self.dense2: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

⏱️ TemporalMixing

MLP-based temporal mixing layer for TSMixer that applies transformations across the time dimension.

kerasfactory.layers.TemporalMixing

Temporal Mixing layer for TSMixer model.

Classes

TemporalMixing
1
2
3
4
5
6
7
TemporalMixing(
    n_series: int,
    input_size: int,
    dropout: float,
    name: str | None = None,
    **kwargs: Any
)

Temporal mixing layer using MLP on time dimension.

Applies batch normalization and linear transformation across the time dimension to mix temporal information while preserving the multivariate structure.

Parameters:

Name Type Description Default
n_series int

Number of time series (channels/features).

required
input_size int

Length of the time series (sequence length).

required
dropout float

Dropout rate between 0 and 1.

required
Input shape

(batch_size, input_size, n_series)

Output shape

(batch_size, input_size, n_series)

Example

layer = TemporalMixing(n_series=7, input_size=96, dropout=0.1) x = keras.random.normal((32, 96, 7)) output = layer(x) output.shape (32, 96, 7)

Initialize the TemporalMixing layer.

Parameters:

Name Type Description Default
n_series int

Number of time series.

required
input_size int

Length of time series.

required
dropout float

Dropout rate.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/TemporalMixing.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def __init__(
    self,
    n_series: int,
    input_size: int,
    dropout: float,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the TemporalMixing layer.

    Args:
        n_series: Number of time series.
        input_size: Length of time series.
        dropout: Dropout rate.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._n_series = n_series
    self._input_size = input_size
    self._dropout = dropout

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.n_series = self._n_series
    self.input_size = self._input_size
    self.dropout_rate = self._dropout

    # Layer components
    self.temporal_norm: layers.BatchNormalization | None = None
    self.temporal_lin: layers.Dense | None = None
    self.dropout_layer: layers.Dropout | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔀 FeatureMixing

Feed-forward network mixing layer for TSMixer that learns cross-series correlations across feature dimension.

kerasfactory.layers.FeatureMixing

Feature Mixing layer for TSMixer model.

Classes

FeatureMixing
1
2
3
4
5
6
7
8
FeatureMixing(
    n_series: int,
    input_size: int,
    dropout: float,
    ff_dim: int,
    name: str | None = None,
    **kwargs: Any
)

Feature mixing layer using MLP on channel dimension.

Applies batch normalization and feed-forward network across the feature (channel) dimension to mix information between different time series while preserving temporal structure.

Parameters:

Name Type Description Default
n_series int

Number of time series (channels/features).

required
input_size int

Length of the time series (sequence length).

required
dropout float

Dropout rate between 0 and 1.

required
ff_dim int

Dimension of the hidden layer in the feed-forward network.

required
Input shape

(batch_size, input_size, n_series)

Output shape

(batch_size, input_size, n_series)

Example

layer = FeatureMixing(n_series=7, input_size=96, dropout=0.1, ff_dim=64) x = keras.random.normal((32, 96, 7)) output = layer(x) output.shape (32, 96, 7)

Initialize the FeatureMixing layer.

Parameters:

Name Type Description Default
n_series int

Number of time series.

required
input_size int

Length of time series.

required
dropout float

Dropout rate.

required
ff_dim int

Feed-forward hidden dimension.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/FeatureMixing.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def __init__(
    self,
    n_series: int,
    input_size: int,
    dropout: float,
    ff_dim: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the FeatureMixing layer.

    Args:
        n_series: Number of time series.
        input_size: Length of time series.
        dropout: Dropout rate.
        ff_dim: Feed-forward hidden dimension.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._n_series = n_series
    self._input_size = input_size
    self._dropout = dropout
    self._ff_dim = ff_dim

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.n_series = self._n_series
    self.input_size = self._input_size
    self.dropout_rate = self._dropout
    self.ff_dim = self._ff_dim

    # Layer components
    self.feature_norm: layers.BatchNormalization | None = None
    self.feature_lin_1: layers.Dense | None = None
    self.feature_lin_2: layers.Dense | None = None
    self.dropout_layer_1: layers.Dropout | None = None
    self.dropout_layer_2: layers.Dropout | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔀 MixingLayer

Core mixing block combining TemporalMixing and FeatureMixing for the TSMixer architecture.

kerasfactory.layers.MixingLayer

Mixing Layer combining temporal and feature mixing for TSMixer.

Classes

MixingLayer
1
2
3
4
5
6
7
8
MixingLayer(
    n_series: int,
    input_size: int,
    dropout: float,
    ff_dim: int,
    name: str | None = None,
    **kwargs: Any
)

Mixing layer combining temporal and feature mixing.

A mixing layer consists of sequential temporal and feature MLPs that jointly learn temporal and cross-sectional representations.

Parameters:

Name Type Description Default
n_series int

Number of time series (channels/features).

required
input_size int

Length of the time series (sequence length).

required
dropout float

Dropout rate between 0 and 1.

required
ff_dim int

Dimension of the hidden layer in the feed-forward network.

required
Input shape

(batch_size, input_size, n_series)

Output shape

(batch_size, input_size, n_series)

Example

layer = MixingLayer(n_series=7, input_size=96, dropout=0.1, ff_dim=64) x = keras.random.normal((32, 96, 7)) output = layer(x) output.shape (32, 96, 7)

Initialize the MixingLayer.

Parameters:

Name Type Description Default
n_series int

Number of time series.

required
input_size int

Length of time series.

required
dropout float

Dropout rate.

required
ff_dim int

Feed-forward hidden dimension.

required
name str | None

Optional layer name.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/MixingLayer.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def __init__(
    self,
    n_series: int,
    input_size: int,
    dropout: float,
    ff_dim: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the MixingLayer.

    Args:
        n_series: Number of time series.
        input_size: Length of time series.
        dropout: Dropout rate.
        ff_dim: Feed-forward hidden dimension.
        name: Optional layer name.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes
    self._n_series = n_series
    self._input_size = input_size
    self._dropout = dropout
    self._ff_dim = ff_dim

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.n_series = self._n_series
    self.input_size = self._input_size
    self.dropout_rate = self._dropout
    self.ff_dim = self._ff_dim

    # Layer components
    self.temporal_mixer: TemporalMixing | None = None
    self.feature_mixer: FeatureMixing | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🎯 Feature Selection & Gating

🔀 VariableSelection

Dynamic feature selection using gated residual networks with optional context conditioning.

kerasfactory.layers.VariableSelection

This module implements a VariableSelection layer that applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It's particularly useful for dynamic feature selection in time series and tabular models.

Classes

VariableSelection
1
2
3
4
5
6
7
8
VariableSelection(
    nr_features: int,
    units: int,
    dropout_rate: float = 0.1,
    use_context: bool = False,
    name: str | None = None,
    **kwargs: Any
)

Layer for dynamic feature selection using gated residual networks.

This layer applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It can optionally use a context vector to condition the feature selection.

Parameters:

Name Type Description Default
nr_features int

Number of input features

required
units int

Number of hidden units in the gated residual network

required
dropout_rate float

Dropout rate for regularization

0.1
use_context bool

Whether to use a context vector for conditioning

False
name str

Name for the layer

None
Input shape

If use_context is False: - Single tensor with shape: (batch_size, nr_features, feature_dim) If use_context is True: - List of two tensors: - Features tensor with shape: (batch_size, nr_features, feature_dim) - Context tensor with shape: (batch_size, context_dim)

Output shape

Tuple of two tensors: - Selected features: (batch_size, feature_dim) - Feature weights: (batch_size, nr_features)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import keras
from kerasfactory.layers import VariableSelection

# Create sample input data
x = keras.random.normal((32, 10, 16))  # 32 batches, 10 features, 16 dims per feature

# Without context
vs = VariableSelection(nr_features=10, units=32, dropout_rate=0.1)
selected, weights = vs(x)
print("Selected features shape:", selected.shape)  # (32, 16)
print("Feature weights shape:", weights.shape)  # (32, 10)

# With context
context = keras.random.normal((32, 64))  # 32 batches, 64-dim context
vs_context = VariableSelection(nr_features=10, units=32, dropout_rate=0.1, use_context=True)
selected, weights = vs_context([x, context])

Initialize the VariableSelection layer.

Parameters:

Name Type Description Default
nr_features int

Number of input features.

required
units int

Number of units in the selection network.

required
dropout_rate float

Dropout rate.

0.1
use_context bool

Whether to use context for selection.

False
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/VariableSelection.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def __init__(
    self,
    nr_features: int,
    units: int,
    dropout_rate: float = 0.1,
    use_context: bool = False,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the VariableSelection layer.

    Args:
        nr_features: Number of input features.
        units: Number of units in the selection network.
        dropout_rate: Dropout rate.
        use_context: Whether to use context for selection.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._nr_features = nr_features
    self._units = units
    self._dropout_rate = dropout_rate
    self._use_context = use_context

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.nr_features = self._nr_features
    self.units = self._units
    self.dropout_rate = self._dropout_rate
    self.use_context = self._use_context

    # Initialize layers
    self.feature_grns: list[GatedResidualNetwork] | None = None
    self.grn_var: GatedResidualNetwork | None = None
    self.softmax: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...] | list[tuple[int, ...]]
) -> list[tuple[int, ...]]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...] | list[tuple[int, ...]]

Shape of the input tensor or list of shapes if using context.

required

Returns:

Type Description
list[tuple[int, ...]]

List of shapes for the output tensors.

Source code in kerasfactory/layers/VariableSelection.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
def compute_output_shape(
    self,
    input_shape: tuple[int, ...] | list[tuple[int, ...]],
) -> list[tuple[int, ...]]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor or list of shapes if using context.

    Returns:
        List of shapes for the output tensors.
    """
    features_shape = input_shape[0] if self.use_context else input_shape

    # Handle different input shape types
    if isinstance(features_shape, list | tuple) and len(features_shape) > 0:
        batch_size = (
            int(features_shape[0])
            if isinstance(features_shape[0], int | float)
            else 1
        )
    else:
        batch_size = 1  # Default fallback

    return [
        (batch_size, self.units),  # Selected features
        (batch_size, self.nr_features),  # Feature weights
    ]

🚪 GatedFeatureSelection

Feature selection layer using gating mechanisms for conditional feature routing.

kerasfactory.layers.GatedFeatureSelection

1
2
3
4
5
GatedFeatureSelection(
    input_dim: int,
    reduction_ratio: int = 4,
    **kwargs: dict[str, Any]
)

Gated feature selection layer with residual connection.

This layer implements a learnable feature selection mechanism using a gating network. Each feature is assigned a dynamic importance weight between 0 and 1 through a multi-layer gating network. The gating network includes batch normalization and ReLU activations for stable training. A small residual connection (0.1) is added to maintain gradient flow.

The layer is particularly useful for: 1. Dynamic feature importance learning 2. Feature selection in time-series data 3. Attention-like mechanisms for tabular data 4. Reducing noise in input features

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np
from keras import layers, Model
from kerasfactory.layers import GatedFeatureSelection

# Create sample input data
input_dim = 20
x = np.random.normal(size=(100, input_dim))

# Build model with gated feature selection
inputs = layers.Input(shape=(input_dim,))
x = GatedFeatureSelection(input_dim=input_dim, reduction_ratio=4)(inputs)
outputs = layers.Dense(1)(x)
model = Model(inputs=inputs, outputs=outputs)

# The layer will learn which features are most important
# and dynamically adjust their contribution to the output

Parameters:

Name Type Description Default
input_dim int

Dimension of the input features

required
reduction_ratio int

Ratio to reduce the hidden dimension of the gating network. A higher ratio means fewer parameters but potentially less expressive gates. Default is 4, meaning the hidden dimension will be input_dim // 4.

4

Initialize the gated feature selection layer.

Parameters:

Name Type Description Default
input_dim int

Dimension of the input features. Must match the last dimension of the input tensor.

required
reduction_ratio int

Ratio to reduce the hidden dimension of the gating network. The hidden dimension will be max(input_dim // reduction_ratio, 1). Default is 4.

4
**kwargs dict[str, Any]

Additional layer arguments passed to the parent Layer class.

{}
Source code in kerasfactory/layers/GatedFeaturesSelection.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def __init__(
    self,
    input_dim: int,
    reduction_ratio: int = 4,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize the gated feature selection layer.

    Args:
        input_dim: Dimension of the input features. Must match the last dimension
            of the input tensor.
        reduction_ratio: Ratio to reduce the hidden dimension of the gating network.
            The hidden dimension will be max(input_dim // reduction_ratio, 1).
            Default is 4.
        **kwargs: Additional layer arguments passed to the parent Layer class.
    """
    super().__init__(**kwargs)
    self.input_dim = input_dim
    self.reduction_ratio = reduction_ratio
    self.gate_net: Sequential | None = None

Functions

from_config classmethod
1
2
3
from_config(
    config: dict[str, Any]
) -> GatedFeatureSelection

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
GatedFeatureSelection

GatedFeatureSelection instance

Source code in kerasfactory/layers/GatedFeaturesSelection.py
146
147
148
149
150
151
152
153
154
155
156
@classmethod
def from_config(cls, config: dict[str, Any]) -> "GatedFeatureSelection":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        GatedFeatureSelection instance
    """
    return cls(**config)

🌊 GatedFeatureFusion

Combines and fuses features using gated mechanisms for adaptive feature integration.

kerasfactory.layers.GatedFeatureFusion

This module implements a GatedFeatureFusion layer that combines two feature representations through a learned gating mechanism. It's particularly useful for tabular datasets with multiple representations (e.g., raw numeric features alongside embeddings).

Classes

GatedFeatureFusion
1
2
3
4
5
GatedFeatureFusion(
    activation: str = "sigmoid",
    name: str | None = None,
    **kwargs: Any
)

Gated feature fusion layer for combining two feature representations.

This layer takes two inputs (e.g., numerical features and their embeddings) and fuses them using a learned gate to balance their contributions. The gate is computed using a dense layer with sigmoid activation, applied to the concatenation of both inputs.

Parameters:

Name Type Description Default
activation str

Activation function to use for the gate. Default is 'sigmoid'.

'sigmoid'
name str | None

Optional name for the layer.

None
Input shape

A list of 2 tensors with shape: [(batch_size, ..., features), (batch_size, ..., features)] Both inputs must have the same shape.

Output shape

Tensor with shape: (batch_size, ..., features), same as each input.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import GatedFeatureFusion

# Two representations for the same 10 features
feat1 = keras.random.normal((32, 10))
feat2 = keras.random.normal((32, 10))

fusion_layer = GatedFeatureFusion()
fused = fusion_layer([feat1, feat2])
print("Fused output shape:", fused.shape)  # Expected: (32, 10)

Initialize the GatedFeatureFusion layer.

Parameters:

Name Type Description Default
activation str

Activation function for the gate.

'sigmoid'
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/GatedFeatureFusion.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(
    self,
    activation: str = "sigmoid",
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the GatedFeatureFusion layer.

    Args:
        activation: Activation function for the gate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._activation = activation

    # No validation needed for activation as Keras will validate it

    # Set public attributes BEFORE calling parent's __init__
    self.activation = self._activation
    self.fusion_gate: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

📍 GatedLinearUnit

Gated linear transformation for controlling information flow in neural networks.

kerasfactory.layers.GatedLinearUnit

This module implements a GatedLinearUnit layer that applies a gated linear transformation to input tensors. It's particularly useful for controlling information flow in neural networks.

Classes

GatedLinearUnit
1
2
3
GatedLinearUnit(
    units: int, name: str | None = None, **kwargs: Any
)

GatedLinearUnit is a custom Keras layer that implements a gated linear unit.

This layer applies a dense linear transformation to the input tensor and multiplies the result with the output of a dense sigmoid transformation. The result is a tensor where the input data is filtered based on the learned weights and biases of the layer.

Parameters:

Name Type Description Default
units int

Positive integer, dimensionality of the output space.

required
name str

Name for the layer.

None
Input shape

Tensor with shape: (batch_size, ..., input_dim)

Output shape

Tensor with shape: (batch_size, ..., units)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import GatedLinearUnit

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Create the layer
glu = GatedLinearUnit(units=8)
y = glu(x)
print("Output shape:", y.shape)  # (32, 8)

Initialize the GatedLinearUnit layer.

Parameters:

Name Type Description Default
units int

Number of units in the layer.

required
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/GatedLinearUnit.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
def __init__(self, units: int, name: str | None = None, **kwargs: Any) -> None:
    """Initialize the GatedLinearUnit layer.

    Args:
        units: Number of units in the layer.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._units = units

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.units = self._units
    self.linear: layers.Dense | None = None
    self.sigmoid: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

🔗 GatedResidualNetwork

Gated residual network architecture for feature processing with residual connections.

kerasfactory.layers.GatedResidualNetwork

This module implements a GatedResidualNetwork layer that combines residual connections with gated linear units for improved gradient flow and feature transformation.

Classes

GatedResidualNetwork
1
2
3
4
5
6
GatedResidualNetwork(
    units: int,
    dropout_rate: float = 0.2,
    name: str | None = None,
    **kwargs: Any
)

GatedResidualNetwork is a custom Keras layer that implements a gated residual network.

This layer applies a series of transformations to the input tensor and combines the result with the input using a residual connection. The transformations include a dense layer with ELU activation, a dense linear layer, a dropout layer, a gated linear unit layer, layer normalization, and a final dense layer.

Parameters:

Name Type Description Default
units int

Positive integer, dimensionality of the output space.

required
dropout_rate float

Dropout rate for regularization. Defaults to 0.2.

0.2
name str

Name for the layer.

None
Input shape

Tensor with shape: (batch_size, ..., input_dim)

Output shape

Tensor with shape: (batch_size, ..., units)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import GatedResidualNetwork

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Create the layer
grn = GatedResidualNetwork(units=16, dropout_rate=0.2)
y = grn(x)
print("Output shape:", y.shape)  # (32, 16)

Initialize the GatedResidualNetwork.

Parameters:

Name Type Description Default
units int

Number of units in the network.

required
dropout_rate float

Dropout rate.

0.2
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/GatedResidualNetwork.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def __init__(
    self,
    units: int,
    dropout_rate: float = 0.2,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the GatedResidualNetwork.

    Args:
        units: Number of units in the network.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._units = units
    self._dropout_rate = dropout_rate

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.units = self._units
    self.dropout_rate = self._dropout_rate

    # Initialize instance variables
    self.elu_dense: layers.Dense | None = None
    self.linear_dense: layers.Dense | None = None
    self.dropout: layers.Dropout | None = None
    self.gated_linear_unit: GatedLinearUnit | None = None
    self.project: layers.Dense | None = None
    self.layer_norm: layers.LayerNormalization | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

👁️ Attention Mechanisms

🎯 TabularAttention

Dual attention mechanism for tabular data with inter-feature and inter-sample attention.

kerasfactory.layers.TabularAttention

This module implements a TabularAttention layer that applies inter-feature and inter-sample attention mechanisms for tabular data. It's particularly useful for capturing complex relationships between features and samples in tabular datasets.

Classes

TabularAttention
1
2
3
4
5
6
7
TabularAttention(
    num_heads: int,
    d_model: int,
    dropout_rate: float = 0.1,
    name: str | None = None,
    **kwargs: Any
)

Custom layer to apply inter-feature and inter-sample attention for tabular data.

This layer implements a dual attention mechanism: 1. Inter-feature attention: Captures dependencies between features for each sample 2. Inter-sample attention: Captures dependencies between samples for each feature

The layer uses MultiHeadAttention for both attention mechanisms and includes layer normalization, dropout, and a feed-forward network.

Parameters:

Name Type Description Default
num_heads int

Number of attention heads

required
d_model int

Dimensionality of the attention model

required
dropout_rate float

Dropout rate for regularization

0.1
name str

Name for the layer

None
Input shape

Tensor with shape: (batch_size, num_samples, num_features)

Output shape

Tensor with shape: (batch_size, num_samples, d_model)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import TabularAttention

# Create sample input data
x = keras.random.normal((32, 100, 20))  # 32 batches, 100 samples, 20 features

# Apply tabular attention
attention = TabularAttention(num_heads=4, d_model=32, dropout_rate=0.1)
y = attention(x)
print("Output shape:", y.shape)  # (32, 100, 32)

Initialize the TabularAttention layer.

Parameters:

Name Type Description Default
num_heads int

Number of attention heads.

required
d_model int

Model dimension.

required
dropout_rate float

Dropout rate.

0.1
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/TabularAttention.py
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def __init__(
    self,
    num_heads: int,
    d_model: int,
    dropout_rate: float = 0.1,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the TabularAttention layer.

    Args:
        num_heads: Number of attention heads.
        d_model: Model dimension.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._num_heads = num_heads
    self._d_model = d_model
    self._dropout_rate = dropout_rate

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.num_heads = self._num_heads
    self.d_model = self._d_model
    self.dropout_rate = self._dropout_rate

    # Initialize layers
    self.input_projection: layers.Dense | None = None
    self.feature_attention: layers.MultiHeadAttention | None = None
    self.feature_layernorm: layers.LayerNormalization | None = None
    self.feature_dropout: layers.Dropout | None = None
    self.feature_layernorm2: layers.LayerNormalization | None = None
    self.feature_dropout2: layers.Dropout | None = None
    self.sample_attention: layers.MultiHeadAttention | None = None
    self.sample_layernorm: layers.LayerNormalization | None = None
    self.sample_dropout: layers.Dropout | None = None
    self.sample_layernorm2: layers.LayerNormalization | None = None
    self.sample_dropout2: layers.Dropout | None = None
    self.ffn_dense1: layers.Dense | None = None
    self.ffn_dense2: layers.Dense | None = None
    self.output_projection: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Shape of the input tensor.

required

Returns:

Type Description
tuple[int, ...]

Shape of the output tensor.

Source code in kerasfactory/layers/TabularAttention.py
223
224
225
226
227
228
229
230
231
232
def compute_output_shape(self, input_shape: tuple[int, ...]) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor.

    Returns:
        Shape of the output tensor.
    """
    return (input_shape[0], input_shape[1], self.d_model)

📊 MultiResolutionTabularAttention

Multi-resolution attention mechanism for capturing features at different scales.

kerasfactory.layers.MultiResolutionTabularAttention

This module implements a MultiResolutionTabularAttention layer that applies separate attention mechanisms for numerical and categorical features, along with cross-attention between them. It's particularly useful for mixed-type tabular data.

Classes

MultiResolutionTabularAttention
1
2
3
4
5
6
7
MultiResolutionTabularAttention(
    num_heads: int,
    d_model: int,
    dropout_rate: float = 0.1,
    name: str | None = None,
    **kwargs: Any
)

Custom layer to apply multi-resolution attention for mixed-type tabular data.

This layer implements separate attention mechanisms for numerical and categorical features, along with cross-attention between them. It's designed to handle the different characteristics of numerical and categorical features in tabular data.

Parameters:

Name Type Description Default
num_heads int

Number of attention heads

required
d_model int

Dimensionality of the attention model

required
dropout_rate float

Dropout rate for regularization

0.1
name str

Name for the layer

None
Input shape

List of two tensors: - Numerical features: (batch_size, num_samples, num_numerical_features) - Categorical features: (batch_size, num_samples, num_categorical_features)

Output shape

List of two tensors with shapes: - (batch_size, num_samples, d_model) (numerical features) - (batch_size, num_samples, d_model) (categorical features)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
from kerasfactory.layers import MultiResolutionTabularAttention

# Create sample input data
numerical = keras.random.normal((32, 100, 10))  # 32 batches, 100 samples, 10 numerical features
categorical = keras.random.normal((32, 100, 5))  # 32 batches, 100 samples, 5 categorical features

# Apply multi-resolution attention
attention = MultiResolutionTabularAttention(num_heads=4, d_model=32, dropout_rate=0.1)
num_out, cat_out = attention([numerical, categorical])
print("Numerical output shape:", num_out.shape)  # (32, 100, 32)
print("Categorical output shape:", cat_out.shape)  # (32, 100, 32)

Initialize the MultiResolutionTabularAttention.

Parameters:

Name Type Description Default
num_heads int

Number of attention heads.

required
d_model int

Model dimension.

required
dropout_rate float

Dropout rate.

0.1
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/MultiResolutionTabularAttention.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def __init__(
    self,
    num_heads: int,
    d_model: int,
    dropout_rate: float = 0.1,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the MultiResolutionTabularAttention.

    Args:
        num_heads: Number of attention heads.
        d_model: Model dimension.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._num_heads = num_heads
    self._d_model = d_model
    self._dropout_rate = dropout_rate

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.num_heads = self._num_heads
    self.d_model = self._d_model
    self.dropout_rate = self._dropout_rate

    # Initialize layers
    # Numerical features
    self.num_projection: layers.Dense | None = None
    self.num_attention: layers.MultiHeadAttention | None = None
    self.num_layernorm1: layers.LayerNormalization | None = None
    self.num_dropout1: layers.Dropout | None = None
    self.num_layernorm2: layers.LayerNormalization | None = None
    self.num_dropout2: layers.Dropout | None = None

    # Categorical features
    self.cat_projection: layers.Dense | None = None
    self.cat_attention: layers.MultiHeadAttention | None = None
    self.cat_layernorm1: layers.LayerNormalization | None = None
    self.cat_dropout1: layers.Dropout | None = None
    self.cat_layernorm2: layers.LayerNormalization | None = None
    self.cat_dropout2: layers.Dropout | None = None

    # Cross-attention
    self.num_cat_attention: layers.MultiHeadAttention | None = None
    self.cat_num_attention: layers.MultiHeadAttention | None = None
    self.cross_num_layernorm: layers.LayerNormalization | None = None
    self.cross_num_dropout: layers.Dropout | None = None
    self.cross_cat_layernorm: layers.LayerNormalization | None = None
    self.cross_cat_dropout: layers.Dropout | None = None

    # Feed-forward networks
    self.ffn_dense1: layers.Dense | None = None
    self.ffn_dense2: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: list[tuple[int, ...]]
) -> list[tuple[int, ...]]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape list[tuple[int, ...]]

List of shapes of the input tensors.

required

Returns:

Type Description
list[tuple[int, ...]]

List of shapes of the output tensors.

Source code in kerasfactory/layers/MultiResolutionTabularAttention.py
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def compute_output_shape(
    self,
    input_shape: list[tuple[int, ...]],
) -> list[tuple[int, ...]]:
    """Compute the output shape of the layer.

    Args:
        input_shape: List of shapes of the input tensors.

    Returns:
        List of shapes of the output tensors.
    """
    num_shape, cat_shape = input_shape
    return [
        (num_shape[0], num_shape[1], self.d_model),
        (cat_shape[0], cat_shape[1], self.d_model),
    ]

🔍 InterpretableMultiHeadAttention

Interpretable multi-head attention layer with explainability features.

kerasfactory.layers.InterpretableMultiHeadAttention

Interpretable Multi-Head Attention layer implementation.

Classes

InterpretableMultiHeadAttention
1
2
3
4
5
6
InterpretableMultiHeadAttention(
    d_model: int,
    n_head: int,
    dropout_rate: float = 0.1,
    **kwargs: dict[str, Any]
)

Interpretable Multi-Head Attention layer.

This layer wraps Keras MultiHeadAttention and stores the attention scores for interpretability purposes. The attention scores can be accessed via the attention_scores attribute after calling the layer.

Parameters:

Name Type Description Default
d_model int

Size of each attention head for query, key, value.

required
n_head int

Number of attention heads.

required
dropout_rate float

Dropout probability. Default: 0.1.

0.1
**kwargs dict[str, Any]

Additional arguments passed to MultiHeadAttention. Supported arguments: - value_dim: Size of each attention head for value. - use_bias: Whether to use bias. Default: True. - output_shape: Expected output shape. Default: None. - attention_axes: Axes for attention. Default: None. - kernel_initializer: Initializer for kernels. Default: 'glorot_uniform'. - bias_initializer: Initializer for biases. Default: 'zeros'. - kernel_regularizer: Regularizer for kernels. Default: None. - bias_regularizer: Regularizer for biases. Default: None. - activity_regularizer: Regularizer for activity. Default: None. - kernel_constraint: Constraint for kernels. Default: None. - bias_constraint: Constraint for biases. Default: None. - seed: Random seed for dropout. Default: None.

{}
Call Args

query: Query tensor of shape (B, S, E) where B is batch size, S is sequence length, and E is the feature dimension. key: Key tensor of shape (B, S, E). value: Value tensor of shape (B, S, E). training: Python boolean indicating whether the layer should behave in training mode (applying dropout) or in inference mode (no dropout).

Returns:

Name Type Description
output

Attention output of shape (B, S, E).

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
d_model = 64
n_head = 4
seq_len = 10
batch_size = 32

layer = InterpretableMultiHeadAttention(
    d_model=d_model,
    n_head=n_head,
    kernel_initializer='he_normal',
    use_bias=False
)
query = tf.random.normal((batch_size, seq_len, d_model))
output = layer(query, query, query)
attention_scores = layer.attention_scores  # Access attention weights

Initialize the layer.

Source code in kerasfactory/layers/InterpretableMultiHeadAttention.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def __init__(
    self,
    d_model: int,
    n_head: int,
    dropout_rate: float = 0.1,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize the layer."""
    # Extract MHA-specific kwargs
    mha_kwargs = {k: v for k, v in kwargs.items() if k in self._valid_mha_kwargs}
    # Remove MHA kwargs from the kwargs passed to parent
    layer_kwargs = {
        k: v for k, v in kwargs.items() if k not in self._valid_mha_kwargs
    }

    super().__init__(**layer_kwargs)
    self.d_model = d_model
    self.n_head = n_head
    self.dropout_rate = dropout_rate
    self.mha_kwargs = mha_kwargs

    # Initialize multihead attention
    self.mha = layers.MultiHeadAttention(
        num_heads=n_head,
        key_dim=d_model,
        dropout=dropout_rate,
        **mha_kwargs,
    )
    self.attention_scores: Any | None = None
Functions
from_config classmethod
1
2
3
from_config(
    config: dict[str, Any]
) -> InterpretableMultiHeadAttention

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
InterpretableMultiHeadAttention

Layer instance

Source code in kerasfactory/layers/InterpretableMultiHeadAttention.py
152
153
154
155
156
157
158
159
160
161
162
@classmethod
def from_config(cls, config: dict[str, Any]) -> "InterpretableMultiHeadAttention":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        Layer instance
    """
    return cls(**config)

🧠 TransformerBlock

Complete transformer block combining self-attention and feed-forward networks.

kerasfactory.layers.TransformerBlock

This module implements a TransformerBlock layer that applies transformer-style self-attention and feed-forward processing to input tensors. It's particularly useful for capturing complex relationships in tabular data.

Classes

TransformerBlock
1
2
3
4
5
6
7
8
TransformerBlock(
    dim_model: int = 32,
    num_heads: int = 3,
    ff_units: int = 16,
    dropout_rate: float = 0.2,
    name: str | None = None,
    **kwargs: Any
)

Transformer block with multi-head attention and feed-forward layers.

This layer implements a standard transformer block with multi-head self-attention followed by a feed-forward network, with residual connections and layer normalization.

Parameters:

Name Type Description Default
dim_model int

Dimensionality of the model.

32
num_heads int

Number of attention heads.

3
ff_units int

Number of units in the feed-forward network.

16
dropout_rate float

Dropout rate for regularization.

0.2
name str

Name for the layer.

None
Input shape

Tensor with shape: (batch_size, sequence_length, dim_model) or (batch_size, dim_model) which will be automatically reshaped.

Output shape

Tensor with shape: (batch_size, sequence_length, dim_model) or (batch_size, dim_model) matching the input shape.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import TransformerBlock

# Create sample input data
x = keras.random.normal((32, 10, 64))  # 32 samples, 10 time steps, 64 features

# Apply transformer block
transformer = TransformerBlock(dim_model=64, num_heads=4, ff_units=128, dropout_rate=0.1)
y = transformer(x)
print("Output shape:", y.shape)  # (32, 10, 64)

Initialize the TransformerBlock layer.

Parameters:

Name Type Description Default
dim_model int

Model dimension.

32
num_heads int

Number of attention heads.

3
ff_units int

Feed-forward units.

16
dropout_rate float

Dropout rate.

0.2
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/TransformerBlock.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def __init__(
    self,
    dim_model: int = 32,
    num_heads: int = 3,
    ff_units: int = 16,
    dropout_rate: float = 0.2,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the TransformerBlock layer.

    Args:
        dim_model: Model dimension.
        num_heads: Number of attention heads.
        ff_units: Feed-forward units.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._dim_model = dim_model
    self._num_heads = num_heads
    self._ff_units = ff_units
    self._dropout_rate = dropout_rate

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.dim_model = self._dim_model
    self.num_heads = self._num_heads
    self.ff_units = self._ff_units
    self.dropout_rate = self._dropout_rate

    # Initialize layers
    self.multihead_attention: layers.MultiHeadAttention | None = None
    self.dropout1: layers.Dropout | None = None
    self.add1: layers.Add | None = None
    self.layer_norm1: layers.LayerNormalization | None = None
    self.ff1: layers.Dense | None = None
    self.dropout2: layers.Dropout | None = None
    self.ff2: layers.Dense | None = None
    self.add2: layers.Add | None = None
    self.layer_norm2: layers.LayerNormalization | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Shape of the input tensor.

required

Returns:

Type Description
tuple[int, ...]

Shape of the output tensor.

Source code in kerasfactory/layers/TransformerBlock.py
198
199
200
201
202
203
204
205
206
207
def compute_output_shape(self, input_shape: tuple[int, ...]) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor.

    Returns:
        Shape of the output tensor.
    """
    return input_shape

📌 ColumnAttention

Attention mechanism focused on inter-column (feature) relationships.

kerasfactory.layers.ColumnAttention

Column attention mechanism for weighting features dynamically.

Classes

ColumnAttention
1
2
3
4
5
ColumnAttention(
    input_dim: int,
    hidden_dim: int | None = None,
    **kwargs: dict[str, Any]
)

Column attention mechanism to weight features dynamically.

This layer applies attention weights to each feature (column) in the input tensor. The attention weights are computed using a two-layer neural network that takes the input features and outputs attention weights for each feature.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import tensorflow as tf
from kerasfactory.layers import ColumnAttention

# Create sample data
batch_size = 32
input_dim = 10
inputs = tf.random.normal((batch_size, input_dim))

# Apply column attention
attention = ColumnAttention(input_dim=input_dim)
weighted_outputs = attention(inputs)

Initialize column attention.

Parameters:

Name Type Description Default
input_dim int

Input dimension

required
hidden_dim int | None

Hidden layer dimension. If None, uses input_dim // 2

None
**kwargs dict[str, Any]

Additional layer arguments

{}
Source code in kerasfactory/layers/ColumnAttention.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def __init__(
    self,
    input_dim: int,
    hidden_dim: int | None = None,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize column attention.

    Args:
        input_dim: Input dimension
        hidden_dim: Hidden layer dimension. If None, uses input_dim // 2
        **kwargs: Additional layer arguments
    """
    super().__init__(**kwargs)
    self.input_dim = input_dim
    self.hidden_dim = hidden_dim or max(input_dim // 2, 1)

    # Initialize layer weights to None
    self.attention_net: Sequential | None = None
Functions
from_config classmethod
1
from_config(config: dict[str, Any]) -> ColumnAttention

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
ColumnAttention

ColumnAttention instance

Source code in kerasfactory/layers/ColumnAttention.py
111
112
113
114
115
116
117
118
119
120
121
@classmethod
def from_config(cls, config: dict[str, Any]) -> "ColumnAttention":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        ColumnAttention instance
    """
    return cls(**config)

📍 RowAttention

Attention mechanism focused on inter-row (sample) relationships.

kerasfactory.layers.RowAttention

Row attention mechanism for weighting samples in a batch.

Classes

RowAttention
1
2
3
4
5
RowAttention(
    feature_dim: int,
    hidden_dim: int | None = None,
    **kwargs: dict[str, Any]
)

Row attention mechanism to weight samples dynamically.

This layer applies attention weights to each sample (row) in the input tensor. The attention weights are computed using a two-layer neural network that takes each sample as input and outputs a scalar attention weight.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import tensorflow as tf
from kerasfactory.layers import RowAttention

# Create sample data
batch_size = 32
feature_dim = 10
inputs = tf.random.normal((batch_size, feature_dim))

# Apply row attention
attention = RowAttention(feature_dim=feature_dim)
weighted_outputs = attention(inputs)

Initialize row attention.

Parameters:

Name Type Description Default
feature_dim int

Number of input features

required
hidden_dim int | None

Hidden layer dimension. If None, uses feature_dim // 2

None
**kwargs dict[str, Any]

Additional layer arguments

{}
Source code in kerasfactory/layers/RowAttention.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def __init__(
    self,
    feature_dim: int,
    hidden_dim: int | None = None,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize row attention.

    Args:
        feature_dim: Number of input features
        hidden_dim: Hidden layer dimension. If None, uses feature_dim // 2
        **kwargs: Additional layer arguments
    """
    super().__init__(**kwargs)
    self.feature_dim = feature_dim
    self.hidden_dim = hidden_dim or max(feature_dim // 2, 1)

    # Two-layer attention mechanism
    self.attention_net = models.Sequential(
        [
            layers.Dense(self.hidden_dim, activation="relu"),
            layers.BatchNormalization(),
            layers.Dense(1, activation="sigmoid"),
        ],
    )
Functions
from_config classmethod
1
from_config(config: dict[str, Any]) -> RowAttention

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
RowAttention

RowAttention instance

Source code in kerasfactory/layers/RowAttention.py
114
115
116
117
118
119
120
121
122
123
124
@classmethod
def from_config(cls, config: dict[str, Any]) -> "RowAttention":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        RowAttention instance
    """
    return cls(**config)

📊 Data Preprocessing & Transformation

🔄 DistributionTransformLayer

Transforms data distributions (log, Box-Cox, Yeo-Johnson, etc.) for improved analysis.

kerasfactory.layers.DistributionTransformLayer

This module implements a DistributionTransformLayer that applies various transformations to make data more normally distributed or to handle specific distribution types better. It's particularly useful for preprocessing data before anomaly detection or other statistical analyses.

Classes

DistributionTransformLayer
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DistributionTransformLayer(
    transform_type: str = "none",
    lambda_param: float = 0.0,
    epsilon: float = 1e-10,
    min_value: float = 0.0,
    max_value: float = 1.0,
    clip_values: bool = True,
    auto_candidates: list[str] | None = None,
    name: str | None = None,
    **kwargs: Any
)

Layer for transforming data distributions to improve anomaly detection.

This layer applies various transformations to make data more normally distributed or to handle specific distribution types better. Supported transformations include log, square root, Box-Cox, Yeo-Johnson, arcsinh, cube-root, logit, quantile, robust-scale, and min-max.

When transform_type is set to 'auto', the layer automatically selects the most appropriate transformation based on the data characteristics during training.

Parameters:

Name Type Description Default
transform_type str

Type of transformation to apply. Options are 'none', 'log', 'sqrt', 'box-cox', 'yeo-johnson', 'arcsinh', 'cube-root', 'logit', 'quantile', 'robust-scale', 'min-max', or 'auto'. Default is 'none'.

'none'
lambda_param float

Parameter for parameterized transformations like Box-Cox and Yeo-Johnson. Default is 0.0.

0.0
epsilon float

Small value added to prevent numerical issues like log(0). Default is 1e-10.

1e-10
min_value float

Minimum value for min-max scaling. Default is 0.0.

0.0
max_value float

Maximum value for min-max scaling. Default is 1.0.

1.0
clip_values bool

Whether to clip values to the specified range in min-max scaling. Default is True.

True
auto_candidates list[str] | None

list of transformation types to consider when transform_type is 'auto'. If None, all available transformations will be considered. Default is None.

None
name str | None

Optional name for the layer.

None
Input shape

N-D tensor with shape: (batch_size, ..., features)

Output shape

Same shape as input: (batch_size, ..., features)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import keras
import numpy as np
from kerasfactory.layers import DistributionTransformLayer

# Create sample input data with skewed distribution
x = keras.random.exponential((32, 10))  # 32 samples, 10 features

# Apply log transformation
log_transform = DistributionTransformLayer(transform_type="log")
y = log_transform(x)
print("Transformed output shape:", y.shape)  # (32, 10)

# Apply Box-Cox transformation with lambda=0.5
box_cox = DistributionTransformLayer(transform_type="box-cox", lambda_param=0.5)
z = box_cox(x)

# Apply arcsinh transformation (handles both positive and negative values)
arcsinh_transform = DistributionTransformLayer(transform_type="arcsinh")
a = arcsinh_transform(x)

# Apply min-max scaling to range [0, 1]
min_max = DistributionTransformLayer(transform_type="min-max", min_value=0.0, max_value=1.0)
b = min_max(x)

# Use automatic transformation selection
auto_transform = DistributionTransformLayer(transform_type="auto")
c = auto_transform(x)  # Will select the best transformation during training

Initialize the DistributionTransformLayer.

Parameters:

Name Type Description Default
transform_type str

Type of transformation to apply.

'none'
lambda_param float

Lambda parameter for Box-Cox transformation.

0.0
epsilon float

Small value to avoid division by zero.

1e-10
min_value float

Minimum value for clipping.

0.0
max_value float

Maximum value for clipping.

1.0
clip_values bool

Whether to clip values.

True
auto_candidates list[str] | None

List of candidate transformations for auto mode.

None
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DistributionTransformLayer.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def __init__(
    self,
    transform_type: str = "none",
    lambda_param: float = 0.0,
    epsilon: float = 1e-10,
    min_value: float = 0.0,
    max_value: float = 1.0,
    clip_values: bool = True,
    auto_candidates: list[str] | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DistributionTransformLayer.

    Args:
        transform_type: Type of transformation to apply.
        lambda_param: Lambda parameter for Box-Cox transformation.
        epsilon: Small value to avoid division by zero.
        min_value: Minimum value for clipping.
        max_value: Maximum value for clipping.
        clip_values: Whether to clip values.
        auto_candidates: List of candidate transformations for auto mode.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._transform_type = transform_type
    self._lambda_param = lambda_param
    self._epsilon = epsilon
    self._min_value = min_value
    self._max_value = max_value
    self._clip_values = clip_values
    self._auto_candidates = auto_candidates

    # Set public attributes BEFORE calling parent's __init__
    self.transform_type = self._transform_type
    self.lambda_param = self._lambda_param
    self.epsilon = self._epsilon
    self.min_value = self._min_value
    self.max_value = self._max_value
    self.clip_values = self._clip_values
    self.auto_candidates = self._auto_candidates

    # Define valid transformations
    self._valid_transforms = [
        "none",
        "log",
        "sqrt",
        "box-cox",
        "yeo-johnson",
        "arcsinh",
        "cube-root",
        "logit",
        "quantile",
        "robust-scale",
        "min-max",
        "auto",
    ]

    # Set default auto candidates if not provided
    if self.auto_candidates is None and self.transform_type == "auto":
        # Exclude 'none' and 'auto' from candidates
        self.auto_candidates = [
            t for t in self._valid_transforms if t not in ["none", "auto"]
        ]

    # Validate parameters
    self._validate_params()

    # Initialize auto-mode variables
    self._selected_transform = None
    self._is_initialized = False

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🎓 DistributionAwareEncoder

Encodes features while accounting for their underlying distributions.

kerasfactory.layers.DistributionAwareEncoder

This module implements a DistributionAwareEncoder layer that automatically detects the distribution type of input data and applies appropriate transformations and encodings. It builds upon the DistributionTransformLayer but adds more sophisticated distribution detection and specialized encoding for different distribution types.

Classes

DistributionAwareEncoder
1
2
3
4
5
6
7
8
9
DistributionAwareEncoder(
    embedding_dim: int | None = None,
    auto_detect: bool = True,
    distribution_type: str = "unknown",
    transform_type: str = "auto",
    add_distribution_embedding: bool = False,
    name: str | None = None,
    **kwargs: Any
)

Layer that automatically detects and encodes data based on its distribution.

This layer first detects the distribution type of the input data and then applies appropriate transformations and encodings. It builds upon the DistributionTransformLayer but adds more sophisticated distribution detection and specialized encoding for different distribution types.

Parameters:

Name Type Description Default
embedding_dim int | None

Dimension of the output embedding. If None, the output will have the same dimension as the input. Default is None.

None
auto_detect bool

Whether to automatically detect the distribution type. If False, the layer will use the specified distribution_type. Default is True.

True
distribution_type str

The distribution type to use if auto_detect is False. Options are "normal", "exponential", "lognormal", "uniform", "beta", "bimodal", "heavy_tailed", "mixed", "bounded", "unknown". Default is "unknown".

'unknown'
transform_type str

The transformation type to use. If "auto", the layer will automatically select the best transformation based on the detected distribution. See DistributionTransformLayer for available options. Default is "auto".

'auto'
add_distribution_embedding bool

Whether to add a learned embedding of the distribution type to the output. Default is False.

False
name str | None

Optional name for the layer.

None
Input shape

N-D tensor with shape: (batch_size, ..., features).

Output shape

If embedding_dim is None, same shape as input: (batch_size, ..., features). If embedding_dim is specified: (batch_size, ..., embedding_dim). If add_distribution_embedding is True, the output will have an additional dimension for the distribution embedding.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import keras
import numpy as np
from kerasfactory.layers import DistributionAwareEncoder

# Create sample input data with different distributions
# Normal distribution
normal_data = keras.ops.convert_to_tensor(
    np.random.normal(0, 1, (100, 10)), dtype="float32"
)

# Exponential distribution
exp_data = keras.ops.convert_to_tensor(
    np.random.exponential(1, (100, 10)), dtype="float32"
)

# Create the encoder
encoder = DistributionAwareEncoder(embedding_dim=16, add_distribution_embedding=True)

# Apply to normal data
normal_encoded = encoder(normal_data)
print("Normal encoded shape:", normal_encoded.shape)  # (100, 16)

# Apply to exponential data
exp_encoded = encoder(exp_data)
print("Exponential encoded shape:", exp_encoded.shape)  # (100, 16)

Initialize the DistributionAwareEncoder.

Parameters:

Name Type Description Default
embedding_dim int | None

Embedding dimension.

None
auto_detect bool

Whether to auto-detect distribution type.

True
distribution_type str

Type of distribution.

'unknown'
transform_type str

Type of transformation to apply.

'auto'
add_distribution_embedding bool

Whether to add distribution embedding.

False
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DistributionAwareEncoder.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def __init__(
    self,
    embedding_dim: int | None = None,
    auto_detect: bool = True,
    distribution_type: str = "unknown",
    transform_type: str = "auto",
    add_distribution_embedding: bool = False,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DistributionAwareEncoder.

    Args:
        embedding_dim: Embedding dimension.
        auto_detect: Whether to auto-detect distribution type.
        distribution_type: Type of distribution.
        transform_type: Type of transformation to apply.
        add_distribution_embedding: Whether to add distribution embedding.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._embedding_dim = embedding_dim
    self._auto_detect = auto_detect
    self._distribution_type = distribution_type
    self._transform_type = transform_type
    self._add_distribution_embedding = add_distribution_embedding

    # Define valid distribution types
    self._valid_distributions = [
        "normal",
        "exponential",
        "lognormal",
        "uniform",
        "beta",
        "bimodal",
        "heavy_tailed",
        "mixed",
        "bounded",
        "unknown",
    ]

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.embedding_dim = self._embedding_dim
    self.auto_detect = self._auto_detect
    self.distribution_type = self._distribution_type
    self.transform_type = self._transform_type
    self.add_distribution_embedding = self._add_distribution_embedding

    # Initialize instance variables
    self.distribution_transform: DistributionTransformLayer | None = None
    self.distribution_embedding: layers.Embedding | None = None
    self.projection: layers.Dense | None = None
    self.detected_distribution: layers.Variable | None = None
    self._is_initialized: bool = False

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)

📈 AdvancedNumericalEmbedding

Advanced numerical embedding layer for rich feature representations.

kerasfactory.layers.AdvancedNumericalEmbedding

This module implements an AdvancedNumericalEmbedding layer that embeds continuous numerical features into a higher-dimensional space using a combination of continuous and discrete branches.

Classes

AdvancedNumericalEmbedding
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
AdvancedNumericalEmbedding(
    embedding_dim: int = 8,
    mlp_hidden_units: int = 16,
    num_bins: int = 10,
    init_min: float | list[float] = -3.0,
    init_max: float | list[float] = 3.0,
    dropout_rate: float = 0.1,
    use_batch_norm: bool = True,
    name: str | None = None,
    **kwargs: Any
)

Advanced numerical embedding layer for continuous features.

This layer embeds each continuous numerical feature into a higher-dimensional space by combining two branches:

  1. Continuous Branch: Each feature is processed via a small MLP.
  2. Discrete Branch: Each feature is discretized into bins using learnable min/max boundaries and then an embedding is looked up for its bin.

A learnable gate combines the two branch outputs per feature and per embedding dimension. Additionally, the continuous branch uses a residual connection and optional batch normalization to improve training stability.

Parameters:

Name Type Description Default
embedding_dim int

Output embedding dimension per feature.

8
mlp_hidden_units int

Hidden units for the continuous branch MLP.

16
num_bins int

Number of bins for discretization.

10
init_min float or list

Initial minimum values for discretization boundaries. If a scalar is provided, it is applied to all features.

-3.0
init_max float or list

Initial maximum values for discretization boundaries.

3.0
dropout_rate float

Dropout rate applied to the continuous branch.

0.1
use_batch_norm bool

Whether to apply batch normalization to the continuous branch.

True
name str

Name for the layer.

None
Input shape

Tensor with shape: (batch_size, num_features)

Output shape

Tensor with shape: (batch_size, num_features, embedding_dim) or (batch_size, embedding_dim) if num_features=1

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import keras
from kerasfactory.layers import AdvancedNumericalEmbedding

# Create sample input data
x = keras.random.normal((32, 5))  # 32 samples, 5 features

# Create the layer
embedding = AdvancedNumericalEmbedding(
    embedding_dim=8,
    mlp_hidden_units=16,
    num_bins=10
)
y = embedding(x)
print("Output shape:", y.shape)  # (32, 5, 8)

Initialize the AdvancedNumericalEmbedding layer.

Parameters:

Name Type Description Default
embedding_dim int

Embedding dimension.

8
mlp_hidden_units int

Hidden units in MLP.

16
num_bins int

Number of bins for discretization.

10
init_min float | list[float]

Minimum initialization value.

-3.0
init_max float | list[float]

Maximum initialization value.

3.0
dropout_rate float

Dropout rate.

0.1
use_batch_norm bool

Whether to use batch normalization.

True
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/AdvancedNumericalEmbedding.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def __init__(
    self,
    embedding_dim: int = 8,
    mlp_hidden_units: int = 16,
    num_bins: int = 10,
    init_min: float | list[float] = -3.0,
    init_max: float | list[float] = 3.0,
    dropout_rate: float = 0.1,
    use_batch_norm: bool = True,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the AdvancedNumericalEmbedding layer.

    Args:
        embedding_dim: Embedding dimension.
        mlp_hidden_units: Hidden units in MLP.
        num_bins: Number of bins for discretization.
        init_min: Minimum initialization value.
        init_max: Maximum initialization value.
        dropout_rate: Dropout rate.
        use_batch_norm: Whether to use batch normalization.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes first
    self._embedding_dim = embedding_dim
    self._mlp_hidden_units = mlp_hidden_units
    self._num_bins = num_bins
    self._init_min = init_min
    self._init_max = init_max
    self._dropout_rate = dropout_rate
    self._use_batch_norm = use_batch_norm

    # Validate parameters
    self._validate_params()

    # Set public attributes BEFORE calling parent's __init__
    self.embedding_dim = self._embedding_dim
    self.mlp_hidden_units = self._mlp_hidden_units
    self.num_bins = self._num_bins
    self.init_min = self._init_min
    self.init_max = self._init_max
    self.dropout_rate = self._dropout_rate
    self.use_batch_norm = self._use_batch_norm

    # Initialize instance variables
    self.num_features: int | None = None
    self.hidden_layer: layers.Dense | None = None
    self.output_layer: layers.Dense | None = None
    self.dropout_layer: layers.Dropout | None = None
    self.batch_norm: layers.BatchNormalization | None = None
    self.residual_proj: layers.Dense | None = None
    self.bin_embeddings: list[layers.Embedding] = []
    self.learned_min: layers.Embedding | None = None
    self.learned_max: layers.Embedding | None = None
    self.gate: layers.Dense | None = None

    # Call parent's __init__ after setting public attributes
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Shape of the input tensor.

required

Returns:

Type Description
tuple[int, ...]

Shape of the output tensor.

Source code in kerasfactory/layers/AdvancedNumericalEmbedding.py
334
335
336
337
338
339
340
341
342
343
344
345
346
def compute_output_shape(self, input_shape: tuple[int, ...]) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor.

    Returns:
        Shape of the output tensor.
    """
    if self.num_features == 1:
        return input_shape[:-1] + (self.embedding_dim,)
    else:
        return input_shape[:-1] + (self.num_features, self.embedding_dim)

📅 DateParsingLayer

Parses and processes date/time features.

kerasfactory.layers.DateParsingLayer

Date Parsing Layer for Keras 3.

This module provides a layer for parsing date strings into numerical components.

Classes

DateParsingLayer
1
DateParsingLayer(date_format: str = 'YYYY-MM-DD', **kwargs)

Layer for parsing date strings into numerical components.

This layer takes date strings in a specified format and returns a tensor containing the year, month, day of the month, and day of the week.

Parameters:

Name Type Description Default
date_format str

Format of the date strings. Currently supports 'YYYY-MM-DD' and 'YYYY/MM/DD'. Default is 'YYYY-MM-DD'.

'YYYY-MM-DD'
**kwargs

Additional keyword arguments to pass to the base layer.

{}
Input shape

String tensor of any shape.

Output shape

Same as input shape with an additional dimension of size 4 appended. For example, if input shape is [batch_size], output shape will be [batch_size, 4].

Initialize the layer.

Source code in kerasfactory/layers/DateParsingLayer.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def __init__(
    self,
    date_format: str = "YYYY-MM-DD",
    **kwargs,
) -> None:
    """Initialize the layer."""
    # Set the date_format attribute before calling super().__init__
    self.date_format = date_format

    # Validate the date format
    self._validate_date_format()

    # Call parent's __init__ after setting attributes
    super().__init__(**kwargs)
Functions
compute_output_shape
1
compute_output_shape(input_shape) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape

Shape of the input tensor.

required

Returns:

Type Description
tuple[int, ...]

Shape of the output tensor.

Source code in kerasfactory/layers/DateParsingLayer.py
165
166
167
168
169
170
171
172
173
174
def compute_output_shape(self, input_shape) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor.

    Returns:
        Shape of the output tensor.
    """
    return input_shape + (4,)

🕐 DateEncodingLayer

Encodes dates into learnable embeddings for temporal features.

kerasfactory.layers.DateEncodingLayer

DateEncodingLayer for encoding date components into cyclical features.

This layer takes date components (year, month, day, day of week) and encodes them into cyclical features using sine and cosine transformations.

Classes

DateEncodingLayer
1
2
3
DateEncodingLayer(
    min_year: int = 1900, max_year: int = 2100, **kwargs
)

Layer for encoding date components into cyclical features.

This layer takes date components (year, month, day, day of week) and encodes them into cyclical features using sine and cosine transformations. The year is normalized to a range between 0 and 1 based on min_year and max_year.

Parameters:

Name Type Description Default
min_year int

Minimum year for normalization (default: 1900)

1900
max_year int

Maximum year for normalization (default: 2100)

2100
**kwargs

Additional layer arguments

{}
Input shape

Tensor with shape: (..., 4) containing [year, month, day, day_of_week]

Output shape

Tensor with shape: (..., 8) containing cyclical encodings: [year_sin, year_cos, month_sin, month_cos, day_sin, day_cos, dow_sin, dow_cos]

Initialize the layer.

Source code in kerasfactory/layers/DateEncodingLayer.py
34
35
36
37
38
39
40
41
42
43
44
def __init__(self, min_year: int = 1900, max_year: int = 2100, **kwargs):
    """Initialize the layer."""
    super().__init__(**kwargs)
    self.min_year = min_year
    self.max_year = max_year

    # Validate inputs
    if min_year >= max_year:
        raise ValueError(
            f"min_year ({min_year}) must be less than max_year ({max_year})",
        )
Functions
compute_output_shape
1
compute_output_shape(input_shape) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape

Shape of the input tensor

required

Returns:

Type Description
tuple[int, ...]

Output shape

Source code in kerasfactory/layers/DateEncodingLayer.py
 97
 98
 99
100
101
102
103
104
105
106
def compute_output_shape(self, input_shape) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor

    Returns:
        Output shape
    """
    return input_shape[:-1] + (8,)

🌙 SeasonLayer

Extracts and processes seasonal patterns from temporal data.

kerasfactory.layers.SeasonLayer

SeasonLayer for adding seasonal information based on month.

This layer adds seasonal information based on the month, encoding it as a one-hot vector for the four seasons: Winter, Spring, Summer, and Fall.

Classes

SeasonLayer
1
SeasonLayer(**kwargs)

Layer for adding seasonal information based on month.

This layer adds seasonal information based on the month, encoding it as a one-hot vector for the four seasons: Winter, Spring, Summer, and Fall.

Parameters:

Name Type Description Default
**kwargs

Additional layer arguments

{}
Input shape

Tensor with shape: (..., 4) containing [year, month, day, day_of_week]

Output shape

Tensor with shape: (..., 8) containing the original 4 components plus 4 one-hot encoded season values

Initialize the layer.

Source code in kerasfactory/layers/SeasonLayer.py
30
31
32
def __init__(self, **kwargs):
    """Initialize the layer."""
    super().__init__(**kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape,
) -> tuple[tuple[int, ...], tuple[int, ...]]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape

Shape of the input tensor

required

Returns:

Type Description
tuple[tuple[int, ...], tuple[int, ...]]

Output shape

Source code in kerasfactory/layers/SeasonLayer.py
106
107
108
109
110
111
112
113
114
115
116
117
118
def compute_output_shape(
    self,
    input_shape,
) -> tuple[tuple[int, ...], tuple[int, ...]]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor

    Returns:
        Output shape
    """
    return input_shape[:-1] + (input_shape[-1] + 4,)

🔀 DifferentialPreprocessingLayer

Applies differential preprocessing transformations to features.

kerasfactory.layers.DifferentialPreprocessingLayer

This module implements a DifferentialPreprocessingLayer that applies multiple candidate transformations to tabular data and learns to combine them optimally. It also handles missing values with learnable imputation. This approach is useful for tabular data where the optimal preprocessing strategy is not known in advance.

Classes

DifferentialPreprocessingLayer
1
2
3
4
5
6
DifferentialPreprocessingLayer(
    num_features: int,
    mlp_hidden_units: int = 4,
    name: str | None = None,
    **kwargs: Any
)

Differentiable preprocessing layer for numeric tabular data with multiple candidate transformations.

This layer
  1. Imputes missing values using a learnable imputation vector.
  2. Applies several candidate transformations:
  3. Identity (pass-through)
  4. Affine transformation (learnable scaling and bias)
  5. Nonlinear transformation via a small MLP
  6. Log transformation (using a softplus to ensure positivity)
  7. Learns softmax combination weights to aggregate the candidates.

The entire preprocessing pipeline is differentiable, so the network learns the optimal imputation and transformation jointly with downstream tasks.

Parameters:

Name Type Description Default
num_features int

Number of numeric features in the input.

required
mlp_hidden_units int

Number of hidden units in the nonlinear branch. Default is 4.

4
name str | None

Optional name for the layer.

None
Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import keras
import numpy as np
from kerasfactory.layers import DifferentialPreprocessingLayer

# Create dummy data: 6 samples, 4 features (with some missing values)
x = keras.ops.convert_to_tensor([
    [1.0, 2.0, float('nan'), 4.0],
    [2.0, float('nan'), 3.0, 4.0],
    [float('nan'), 2.0, 3.0, 4.0],
    [1.0, 2.0, 3.0, float('nan')],
    [1.0, 2.0, 3.0, 4.0],
    [2.0, 3.0, 4.0, 5.0],
], dtype="float32")

# Instantiate the layer for 4 features.
preproc_layer = DifferentialPreprocessingLayer(num_features=4, mlp_hidden_units=8)
y = preproc_layer(x)
print(y)

Initialize the DifferentialPreprocessingLayer.

Parameters:

Name Type Description Default
num_features int

Number of input features.

required
mlp_hidden_units int

Number of hidden units in MLP.

4
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DifferentialPreprocessingLayer.py
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def __init__(
    self,
    num_features: int,
    mlp_hidden_units: int = 4,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DifferentialPreprocessingLayer.

    Args:
        num_features: Number of input features.
        mlp_hidden_units: Number of hidden units in MLP.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.num_features = num_features
    self.mlp_hidden_units = mlp_hidden_units
    self.num_candidates = 4  # We have 4 candidate branches

    # Initialize instance variables
    self.impute: layers.Embedding | None = None
    self.gamma: layers.Embedding | None = None
    self.beta: layers.Embedding | None = None
    self.mlp_hidden: layers.Dense | None = None
    self.mlp_output: layers.Dense | None = None
    self.alpha: layers.Embedding | None = None

    # Validate parameters during initialization
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🔧 DifferentiableTabularPreprocessor

Differentiable preprocessing layer for tabular data end-to-end training.

kerasfactory.layers.DifferentiableTabularPreprocessor

This module implements a DifferentiableTabularPreprocessor layer that integrates preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end. This approach is useful for tabular data with missing values and features that need normalization.

Classes

DifferentiableTabularPreprocessor
1
2
3
4
5
DifferentiableTabularPreprocessor(
    num_features: int,
    name: str | None = None,
    **kwargs: Any
)

A differentiable preprocessing layer for numeric tabular data.

This layer
  • Replaces missing values (NaNs) with a learnable imputation vector.
  • Applies a learned affine transformation (scaling and shifting) to each feature.

The idea is to integrate preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end.

Parameters:

Name Type Description Default
num_features int

Number of numeric features in the input.

required
name str | None

Optional name for the layer.

None
Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import keras
import numpy as np
from kerasfactory.layers import DifferentiableTabularPreprocessor

# Suppose we have tabular data with 5 numeric features
x = keras.ops.convert_to_tensor([
    [1.0, np.nan, 3.0, 4.0, 5.0],
    [2.0, 2.0, np.nan, 4.0, 5.0]
], dtype="float32")

preproc = DifferentiableTabularPreprocessor(num_features=5)
y = preproc(x)
print(y)

Initialize the DifferentiableTabularPreprocessor.

Parameters:

Name Type Description Default
num_features int

Number of input features.

required
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/DifferentiableTabularPreprocessor.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def __init__(
    self,
    num_features: int,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the DifferentiableTabularPreprocessor.

    Args:
        num_features: Number of input features.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.num_features = num_features

    # Initialize instance variables
    self.impute = None
    self.gamma = None
    self.beta = None

    # Validate parameters during initialization
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🎨 CastToFloat32Layer

Type casting layer for ensuring float32 precision.

kerasfactory.layers.CastToFloat32Layer

This module implements a CastToFloat32Layer that casts input tensors to float32 data type.

Classes

CastToFloat32Layer
1
CastToFloat32Layer(name: str | None = None, **kwargs: Any)

Layer that casts input tensors to float32 data type.

This layer is useful for ensuring consistent data types in a model, especially when working with mixed precision or when receiving inputs of various data types.

Parameters:

Name Type Description Default
name str | None

Optional name for the layer.

None
Input shape

Tensor of any shape and numeric data type.

Output shape

Same as input shape, but with float32 data type.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import keras
import numpy as np
from kerasfactory.layers import CastToFloat32Layer

# Create sample input data with int64 type
x = keras.ops.convert_to_tensor(np.array([1, 2, 3], dtype=np.int64))

# Apply casting layer
cast_layer = CastToFloat32Layer()
y = cast_layer(x)

print(y.dtype)  # float32

Initialize the CastToFloat32Layer.

Parameters:

Name Type Description Default
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/CastToFloat32Layer.py
45
46
47
48
49
50
51
52
53
54
55
56
57
def __init__(self, name: str | None = None, **kwargs: Any) -> None:
    """Initialize the CastToFloat32Layer.

    Args:
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # No private attributes to set

    # No parameters to validate

    # Call parent's __init__
    super().__init__(name=name, **kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Shape of the input tensor.

required

Returns:

Type Description
tuple[int, ...]

Same shape as input.

Source code in kerasfactory/layers/CastToFloat32Layer.py
70
71
72
73
74
75
76
77
78
79
def compute_output_shape(self, input_shape: tuple[int, ...]) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape of the input tensor.

    Returns:
        Same shape as input.
    """
    return input_shape

🌐 Graph & Ensemble Methods

📊 GraphFeatureAggregation

Aggregates features from graph structures for relational learning.

kerasfactory.layers.GraphFeatureAggregation

This module implements a GraphFeatureAggregation layer that treats features as nodes in a graph and uses attention mechanisms to learn relationships between features. This approach is useful for tabular data where features have inherent relationships.

Classes

GraphFeatureAggregation
1
2
3
4
5
6
7
GraphFeatureAggregation(
    embed_dim: int = 8,
    dropout_rate: float = 0.0,
    leaky_relu_alpha: float = 0.2,
    name: str | None = None,
    **kwargs: Any
)

Graph-based feature aggregation layer with self-attention for tabular data.

This layer treats each input feature as a node and projects it into an embedding space. It then computes pairwise attention scores between features and aggregates feature information based on these scores. Finally, it projects the aggregated features back to the original feature space and adds a residual connection.

The process involves
  1. Projecting each scalar feature to an embedding (shape: [batch, num_features, embed_dim]).
  2. Computing pairwise concatenated embeddings and scoring them via a learnable attention vector.
  3. Normalizing the scores with softmax to yield a dynamic adjacency (attention) matrix.
  4. Aggregating neighboring features via weighted sum.
  5. Projecting back to a vector of original dimension, then adding a residual connection.

Parameters:

Name Type Description Default
embed_dim int

Dimensionality of the projected feature embeddings. Default is 8.

8
dropout_rate float

Dropout rate to apply on attention weights. Default is 0.0.

0.0
leaky_relu_alpha float

Alpha parameter for the LeakyReLU activation. Default is 0.2.

0.2
name str | None

Optional name for the layer.

None
Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import GraphFeatureAggregation

# Tabular data with 10 features
x = keras.random.normal((32, 10))

# Create the layer with an embedding dimension of 8 and dropout rate of 0.1
graph_layer = GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1)
y = graph_layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the GraphFeatureAggregation layer.

Parameters:

Name Type Description Default
embed_dim int

Embedding dimension.

8
dropout_rate float

Dropout rate.

0.0
leaky_relu_alpha float

Alpha parameter for LeakyReLU.

0.2
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/GraphFeatureAggregation.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def __init__(
    self,
    embed_dim: int = 8,
    dropout_rate: float = 0.0,
    leaky_relu_alpha: float = 0.2,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the GraphFeatureAggregation layer.

    Args:
        embed_dim: Embedding dimension.
        dropout_rate: Dropout rate.
        leaky_relu_alpha: Alpha parameter for LeakyReLU.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.embed_dim = embed_dim
    self.dropout_rate = dropout_rate
    self.leaky_relu_alpha = leaky_relu_alpha

    # Initialize instance variables
    self.num_features: int | None = None
    self.projection: layers.Dense | None = None
    self.attention_a: layers.Dense | None = None
    self.attention_bias: layers.Dense | None = None
    self.leaky_relu: layers.LeakyReLU | None = None
    self.dropout_layer: layers.Dropout | None = None
    self.out_proj: layers.Dense | None = None

    # Validate parameters during initialization
    self._validate_params()
    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🧬 AdvancedGraphFeatureLayer

Advanced graph feature processing with multi-hop aggregation.

kerasfactory.layers.AdvancedGraphFeatureLayer

1
2
3
4
5
6
7
8
AdvancedGraphFeatureLayer(
    embed_dim: int,
    num_heads: int,
    dropout_rate: float = 0.0,
    hierarchical: bool = False,
    num_groups: int | None = None,
    **kwargs
)

Advanced graph-based feature layer for tabular data.

This layer projects scalar features into an embedding space and then applies multi-head self-attention to compute data-dependent dynamic adjacencies between features. It learns edge attributes by considering both the raw embeddings and their differences. Optionally, a hierarchical aggregation is applied, where features are grouped via a learned soft-assignment and then re-expanded back to the original feature space. A residual connection and layer normalization are applied before the final projection back to the original feature space.

The layer is highly configurable, allowing for control over the embedding dimension, number of attention heads, dropout rate, and hierarchical aggregation.

Notes

When to Use This Layer: - When working with tabular data where feature interactions are important - For complex feature engineering tasks where manual feature crosses are insufficient - When dealing with heterogeneous features that require dynamic, learned relationships - In scenarios where feature importance varies across different samples - When hierarchical feature relationships exist in your data

Best Practices: - Start with a small embed_dim (e.g., 16 or 32) and increase if needed - Use num_heads=4 or 8 for most applications - Enable hierarchical=True when you have many features (>20) or known grouping structure - Set dropout_rate=0.1 or 0.2 for regularization during training - Use layer normalization (enabled by default) to stabilize training

Performance Considerations: - Memory usage scales quadratically with the number of features - Consider using hierarchical mode for large feature sets to reduce complexity - The layer works best with normalized input features - For very large feature sets (>100), consider feature pre-selection

Parameters:

Name Type Description Default
embed_dim int

Dimensionality of the projected feature embeddings. Determines the size of the learned feature representations.

required
num_heads int

Number of attention heads. Must divide embed_dim evenly. Each head learns different aspects of feature relationships.

required
dropout_rate float

Dropout rate applied to attention weights during training. Helps prevent overfitting. Defaults to 0.0.

0.0
hierarchical bool

Whether to apply hierarchical aggregation. If True, features are grouped into clusters, and aggregation is performed at the cluster level. Defaults to False.

False
num_groups int

Number of groups to cluster features into when hierarchical is True. Must be provided if hierarchical is True. Controls the granularity of hierarchical aggregation.

None

Raises:

Type Description
ValueError

If embed_dim is not divisible by num_heads. Ensures that the embedding dimension can be evenly split across attention heads.

ValueError

If hierarchical is True but num_groups is not provided. The number of groups must be specified when hierarchical aggregation is enabled.

Examples:

Basic Usage:

1
2
3
4
5
6
7
8
9
import keras
from kerasfactory.layers import AdvancedGraphFeatureLayer

# Dummy tabular data with 10 features for 32 samples.
x = keras.random.normal((32, 10))
# Create the advanced graph layer with an embedding dimension of 16 and 4 heads.
layer = AdvancedGraphFeatureLayer(embed_dim=16, num_heads=4)
y = layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

With Hierarchical Aggregation:

1
2
3
4
5
6
7
8
9
import keras
from kerasfactory.layers import AdvancedGraphFeatureLayer

# Dummy tabular data with 10 features for 32 samples.
x = keras.random.normal((32, 10))
# Create the advanced graph layer with hierarchical aggregation into 4 groups.
layer = AdvancedGraphFeatureLayer(embed_dim=16, num_heads=4, hierarchical=True, num_groups=4)
y = layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Without Training:

1
2
3
4
5
6
7
8
9
import keras
from kerasfactory.layers import AdvancedGraphFeatureLayer

# Dummy tabular data with 10 features for 32 samples.
x = keras.random.normal((32, 10))
# Create the advanced graph layer with an embedding dimension of 16 and 4 heads.
layer = AdvancedGraphFeatureLayer(embed_dim=16, num_heads=4)
y = layer(x, training=False)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the AdvancedGraphFeature layer.

Parameters:

Name Type Description Default
embed_dim int

Embedding dimension.

required
num_heads int

Number of attention heads.

required
dropout_rate float

Dropout rate.

0.0
hierarchical bool

Whether to use hierarchical attention.

False
num_groups int | None

Number of groups for hierarchical attention.

None
**kwargs

Additional keyword arguments.

{}
Source code in kerasfactory/layers/AdvancedGraphFeature.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def __init__(
    self,
    embed_dim: int,
    num_heads: int,
    dropout_rate: float = 0.0,
    hierarchical: bool = False,
    num_groups: int | None = None,
    **kwargs,
) -> None:
    """Initialize the AdvancedGraphFeature layer.

    Args:
        embed_dim: Embedding dimension.
        num_heads: Number of attention heads.
        dropout_rate: Dropout rate.
        hierarchical: Whether to use hierarchical attention.
        num_groups: Number of groups for hierarchical attention.
        **kwargs: Additional keyword arguments.
    """
    # Validate parameters before setting attributes
    if embed_dim % num_heads != 0:
        raise ValueError("embed_dim must be divisible by num_heads")
    if hierarchical and num_groups is None:
        raise ValueError("num_groups must be specified when hierarchical is True")

    # Set attributes before calling super().__init__
    self.embed_dim = embed_dim
    self.num_heads = num_heads
    self.dropout_rate = dropout_rate
    self.hierarchical = hierarchical
    self.num_groups = num_groups
    self.depth = embed_dim // num_heads

    super().__init__(**kwargs)

Functions

compute_output_shape
1
compute_output_shape(input_shape) -> tuple[int, ...]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape

Shape tuple (batch_size, num_features)

required

Returns:

Type Description
tuple[int, ...]

Output shape tuple (batch_size, num_features)

Source code in kerasfactory/layers/AdvancedGraphFeature.py
326
327
328
329
330
331
332
333
334
335
def compute_output_shape(self, input_shape) -> tuple[int, ...]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Shape tuple (batch_size, num_features)

    Returns:
        Output shape tuple (batch_size, num_features)
    """
    return input_shape

👥 MultiHeadGraphFeaturePreprocessor

Multi-head preprocessing for graph features with parallel aggregation.

kerasfactory.layers.MultiHeadGraphFeaturePreprocessor

This module implements a MultiHeadGraphFeaturePreprocessor layer that treats features as nodes in a graph and learns multiple "views" (heads) of the feature interactions via self-attention. This approach is useful for tabular data where complex feature relationships need to be captured.

Classes

MultiHeadGraphFeaturePreprocessor
1
2
3
4
5
6
7
MultiHeadGraphFeaturePreprocessor(
    embed_dim: int = 16,
    num_heads: int = 4,
    dropout_rate: float = 0.0,
    name: str | None = None,
    **kwargs: Any
)

Multi-head graph-based feature preprocessor for tabular data.

This layer treats each feature as a node and applies multi-head self-attention to capture and aggregate complex interactions among features. The process is:

  1. Project each scalar input into an embedding of dimension embed_dim.
  2. Split the embedding into num_heads heads.
  3. For each head, compute queries, keys, and values and calculate scaled dot-product attention across the feature dimension.
  4. Concatenate the head outputs, project back to the original feature dimension, and add a residual connection.

This mechanism allows the network to learn multiple relational views among features, which can significantly boost performance on tabular data.

Parameters:

Name Type Description Default
embed_dim int

Dimension of the feature embeddings. Default is 16.

16
num_heads int

Number of attention heads. Default is 4.

4
dropout_rate float

Dropout rate applied to attention weights. Default is 0.0.

0.0
name str | None

Optional name for the layer.

None
Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import MultiHeadGraphFeaturePreprocessor

# Tabular data with 10 features
x = keras.random.normal((32, 10))

# Create the layer with 16-dim embeddings and 4 attention heads
graph_preproc = MultiHeadGraphFeaturePreprocessor(embed_dim=16, num_heads=4)
y = graph_preproc(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the MultiHeadGraphFeaturePreprocessor.

Parameters:

Name Type Description Default
embed_dim int

Embedding dimension.

16
num_heads int

Number of attention heads.

4
dropout_rate float

Dropout rate.

0.0
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/MultiHeadGraphFeaturePreprocessor.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def __init__(
    self,
    embed_dim: int = 16,
    num_heads: int = 4,
    dropout_rate: float = 0.0,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the MultiHeadGraphFeaturePreprocessor.

    Args:
        embed_dim: Embedding dimension.
        num_heads: Number of attention heads.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.embed_dim = embed_dim
    self.num_heads = num_heads
    self.dropout_rate = dropout_rate

    # Initialize instance variables
    self.projection: layers.Dense | None = None
    self.q_dense: layers.Dense | None = None
    self.k_dense: layers.Dense | None = None
    self.v_dense: layers.Dense | None = None
    self.out_proj: layers.Dense | None = None
    self.final_dense: layers.Dense | None = None
    self.dropout_layer: layers.Dropout | None = None
    self.num_features: int | None = None
    self.depth: int | None = None

    # Validate parameters
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)
Functions
split_heads
1
2
3
split_heads(
    x: KerasTensor, batch_size: KerasTensor
) -> KerasTensor

Split the last dimension into (num_heads, depth) and transpose.

Parameters:

Name Type Description Default
x KerasTensor

Input tensor with shape (batch_size, num_features, embed_dim).

required
batch_size KerasTensor

Batch size tensor.

required

Returns:

Type Description
KerasTensor

Tensor with shape (batch_size, num_heads, num_features, depth).

Source code in kerasfactory/layers/MultiHeadGraphFeaturePreprocessor.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def split_heads(self, x: KerasTensor, batch_size: KerasTensor) -> KerasTensor:
    """Split the last dimension into (num_heads, depth) and transpose.

    Args:
        x: Input tensor with shape (batch_size, num_features, embed_dim).
        batch_size: Batch size tensor.

    Returns:
        Tensor with shape (batch_size, num_heads, num_features, depth).
    """
    # Get the actual number of features from the input tensor
    actual_num_features = ops.shape(x)[1]

    x = ops.reshape(
        x,
        (batch_size, actual_num_features, self.num_heads, self.depth),
    )
    return ops.transpose(x, (0, 2, 1, 3))

📈 BoostingBlock

Boosting ensemble block for combining weak learners.

kerasfactory.layers.BoostingBlock

This module implements a BoostingBlock layer that simulates gradient boosting behavior in a neural network. The layer computes a correction term via a configurable MLP and adds a scaled version to the input.

Classes

BoostingBlock
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
BoostingBlock(
    hidden_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    gamma_initializer: str
    | initializers.Initializer = "ones",
    use_bias: bool = True,
    kernel_initializer: str
    | initializers.Initializer = "glorot_uniform",
    bias_initializer: str
    | initializers.Initializer = "zeros",
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any
)

A neural network layer that simulates gradient boosting behavior.

This layer implements a weak learner that computes a correction term via a configurable MLP and adds a scaled version of this correction to the input. Stacking several such blocks can mimic the iterative residual-correction process of gradient boosting.

The output is computed as

output = inputs + gamma * f(inputs)

where: - f is a configurable MLP (default: two-layer network) - gamma is a learnable or fixed scaling factor

Parameters:

Name Type Description Default
hidden_units int | list[int]

Number of units in the hidden layer(s). Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64.

64
hidden_activation str

Activation function for hidden layers. Default is 'relu'.

'relu'
output_activation str | None

Activation function for the output layer. Default is None.

None
gamma_trainable bool

Whether the scaling factor gamma is trainable. Default is True.

True
gamma_initializer str | Initializer

Initializer for the gamma scaling factor. Default is 'ones'.

'ones'
use_bias bool

Whether to include bias terms in the dense layers. Default is True.

True
kernel_initializer str | Initializer

Initializer for the dense layer kernels. Default is 'glorot_uniform'.

'glorot_uniform'
bias_initializer str | Initializer

Initializer for the dense layer biases. Default is 'zeros'.

'zeros'
dropout_rate float | None

Optional dropout rate to apply after hidden layers. Default is None.

None
name str | None

Optional name for the layer.

None
Input shape

N-D tensor with shape: (batch_size, ..., input_dim)

Output shape

Same shape as input: (batch_size, ..., input_dim)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import tensorflow as tf
from kerasfactory.layers import BoostingBlock

# Create sample input data
x = tf.random.normal((32, 16))  # 32 samples, 16 features

# Basic usage
block = BoostingBlock(hidden_units=64)
y = block(x)
print("Output shape:", y.shape)  # (32, 16)

# Advanced configuration
block = BoostingBlock(
    hidden_units=[32, 16],  # Two hidden layers
    hidden_activation='selu',
    dropout_rate=0.1,
    gamma_trainable=False
)
y = block(x)

Initialize the BoostingBlock layer.

Parameters:

Name Type Description Default
hidden_units int | list[int]

Number of hidden units or list of units per layer.

64
hidden_activation str

Activation function for hidden layers.

'relu'
output_activation str | None

Activation function for output layer.

None
gamma_trainable bool

Whether gamma parameter is trainable.

True
gamma_initializer str | Initializer

Initializer for gamma parameter.

'ones'
use_bias bool

Whether to use bias.

True
kernel_initializer str | Initializer

Initializer for kernel weights.

'glorot_uniform'
bias_initializer str | Initializer

Initializer for bias weights.

'zeros'
dropout_rate float | None

Dropout rate.

None
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/BoostingBlock.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def __init__(
    self,
    hidden_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    gamma_initializer: str | initializers.Initializer = "ones",
    use_bias: bool = True,
    kernel_initializer: str | initializers.Initializer = "glorot_uniform",
    bias_initializer: str | initializers.Initializer = "zeros",
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the BoostingBlock layer.

    Args:
        hidden_units: Number of hidden units or list of units per layer.
        hidden_activation: Activation function for hidden layers.
        output_activation: Activation function for output layer.
        gamma_trainable: Whether gamma parameter is trainable.
        gamma_initializer: Initializer for gamma parameter.
        use_bias: Whether to use bias.
        kernel_initializer: Initializer for kernel weights.
        bias_initializer: Initializer for bias weights.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set attributes before calling parent's __init__
    self._hidden_units = (
        [hidden_units] if isinstance(hidden_units, int) else hidden_units
    )
    self._hidden_activation = hidden_activation
    self._output_activation = output_activation
    self._gamma_trainable = gamma_trainable
    self._gamma_initializer = initializers.get(gamma_initializer)
    self._use_bias = use_bias
    self._kernel_initializer = initializers.get(kernel_initializer)
    self._bias_initializer = initializers.get(bias_initializer)
    self._dropout_rate = dropout_rate

    # Validate parameters
    if any(units <= 0 for units in self._hidden_units):
        raise ValueError("All hidden_units must be positive integers")
    if dropout_rate is not None and not 0 <= dropout_rate < 1:
        raise ValueError("dropout_rate must be between 0 and 1")

    super().__init__(name=name, **kwargs)

    # Now set public attributes
    self.hidden_units = self._hidden_units
    self.hidden_activation = self._hidden_activation
    self.output_activation = self._output_activation
    self.gamma_trainable = self._gamma_trainable
    self.gamma_initializer = self._gamma_initializer
    self.use_bias = self._use_bias
    self.kernel_initializer = self._kernel_initializer
    self.bias_initializer = self._bias_initializer
    self.dropout_rate = self._dropout_rate

🎯 BoostingEnsembleLayer

Ensemble layer implementing gradient boosting mechanisms.

kerasfactory.layers.BoostingEnsembleLayer

This module implements a BoostingEnsembleLayer that aggregates multiple BoostingBlocks in parallel. Their outputs are combined via learnable weights to form an ensemble prediction. This is similar in spirit to boosting ensembles but implemented in a differentiable, end-to-end manner.

Classes

BoostingEnsembleLayer
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
BoostingEnsembleLayer(
    num_learners: int = 3,
    learner_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any
)

Ensemble layer of boosting blocks for tabular data.

This layer aggregates multiple boosting blocks (weak learners) in parallel. Each learner produces a correction to the input. A gating mechanism (via learnable weights) then computes a weighted sum of the learners' outputs.

Parameters:

Name Type Description Default
num_learners int

Number of boosting blocks in the ensemble. Default is 3.

3
learner_units int | list[int]

Number of hidden units in each boosting block. Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64.

64
hidden_activation str

Activation function for hidden layers in boosting blocks. Default is 'relu'.

'relu'
output_activation str | None

Activation function for the output layer in boosting blocks. Default is None.

None
gamma_trainable bool

Whether the scaling factor gamma in boosting blocks is trainable. Default is True.

True
dropout_rate float | None

Optional dropout rate to apply in boosting blocks. Default is None.

None
name str | None

Optional name for the layer.

None
Input shape

N-D tensor with shape: (batch_size, ..., input_dim)

Output shape

Same shape as input: (batch_size, ..., input_dim)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import keras
from kerasfactory.layers import BoostingEnsembleLayer

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Basic usage
ensemble = BoostingEnsembleLayer(num_learners=3, learner_units=64)
y = ensemble(x)
print("Ensemble output shape:", y.shape)  # (32, 16)

# Advanced configuration
ensemble = BoostingEnsembleLayer(
    num_learners=5,
    learner_units=[32, 16],  # Two hidden layers in each learner
    hidden_activation='selu',
    dropout_rate=0.1
)
y = ensemble(x)

Initialize the BoostingEnsembleLayer.

Parameters:

Name Type Description Default
num_learners int

Number of boosting learners.

3
learner_units int | list[int]

Number of units per learner or list of units.

64
hidden_activation str

Activation function for hidden layers.

'relu'
output_activation str | None

Activation function for output layer.

None
gamma_trainable bool

Whether gamma parameter is trainable.

True
dropout_rate float | None

Dropout rate.

None
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/BoostingEnsembleLayer.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def __init__(
    self,
    num_learners: int = 3,
    learner_units: int | list[int] = 64,
    hidden_activation: str = "relu",
    output_activation: str | None = None,
    gamma_trainable: bool = True,
    dropout_rate: float | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the BoostingEnsembleLayer.

    Args:
        num_learners: Number of boosting learners.
        learner_units: Number of units per learner or list of units.
        hidden_activation: Activation function for hidden layers.
        output_activation: Activation function for output layer.
        gamma_trainable: Whether gamma parameter is trainable.
        dropout_rate: Dropout rate.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set private attributes before calling parent's __init__
    self._num_learners = num_learners
    self._learner_units = learner_units
    self._hidden_activation = hidden_activation
    self._output_activation = output_activation
    self._gamma_trainable = gamma_trainable
    self._dropout_rate = dropout_rate

    # Validate parameters
    if num_learners <= 0:
        raise ValueError(f"num_learners must be positive, got {num_learners}")
    if dropout_rate is not None and not 0 <= dropout_rate < 1:
        raise ValueError("dropout_rate must be between 0 and 1")

    # Set public attributes before calling parent's __init__
    self.num_learners = self._num_learners
    self.learner_units = self._learner_units
    self.hidden_activation = self._hidden_activation
    self.output_activation = self._output_activation
    self.gamma_trainable = self._gamma_trainable
    self.dropout_rate = self._dropout_rate
    self.learners: list[BoostingBlock] | None = None
    self.alpha: layers.Variable | None = None

    super().__init__(name=name, **kwargs)

📊 TabularMoELayer

Mixture of Experts layer optimized for tabular data.

kerasfactory.layers.TabularMoELayer

This module implements a TabularMoELayer (Mixture-of-Experts) that routes input features through multiple expert sub-networks and aggregates their outputs via a learnable gating mechanism. This approach is useful for tabular data where different experts can specialize in different feature patterns.

Classes

TabularMoELayer
1
2
3
4
5
6
TabularMoELayer(
    num_experts: int = 4,
    expert_units: int = 16,
    name: str | None = None,
    **kwargs: Any
)

Mixture-of-Experts layer for tabular data.

This layer routes input features through multiple expert sub-networks and aggregates their outputs via a learnable gating mechanism. Each expert is a small MLP, and the gate learns to weight their contributions.

Parameters:

Name Type Description Default
num_experts int

Number of expert networks. Default is 4.

4
expert_units int

Number of hidden units in each expert network. Default is 16.

16
name str | None

Optional name for the layer.

None
Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import TabularMoELayer

# Tabular data with 8 features
x = keras.random.normal((32, 8))

# Create the layer with 4 experts and 16 units per expert
moe_layer = TabularMoELayer(num_experts=4, expert_units=16)
y = moe_layer(x)
print("MoE output shape:", y.shape)  # Expected: (32, 8)

Initialize the TabularMoELayer.

Parameters:

Name Type Description Default
num_experts int

Number of expert networks.

4
expert_units int

Number of units in each expert.

16
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/TabularMoELayer.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def __init__(
    self,
    num_experts: int = 4,
    expert_units: int = 16,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the TabularMoELayer.

    Args:
        num_experts: Number of expert networks.
        expert_units: Number of units in each expert.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.num_experts = num_experts
    self.expert_units = expert_units

    # Initialize instance variables
    self.experts: list[Any] | None = None
    self.expert_outputs: list[Any] | None = None
    self.gate: Any | None = None

    # Validate parameters during initialization
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🏗️ BusinessRulesLayer

Layer for integrating domain-specific business rules into model.

kerasfactory.layers.BusinessRulesLayer

This module implements a BusinessRulesLayer that allows applying configurable business rules to neural network outputs. This enables combining learned patterns with explicit domain knowledge.

Classes

BusinessRulesLayer
1
2
3
4
5
6
7
8
9
BusinessRulesLayer(
    rules: list[Rule],
    feature_type: str,
    trainable_weights: bool = True,
    weight_initializer: str
    | initializers.Initializer = "ones",
    name: str | None = None,
    **kwargs: Any
)

Evaluates business-defined rules for anomaly detection.

This layer applies user-defined business rules to detect anomalies. Rules can be defined for both numerical and categorical features.

For numerical features
  • Comparison operators: '>' and '<'
  • Example: [(">", 0), ("<", 100)] for range validation
For categorical features
  • Set operators: '==', 'in', '!=', 'not in'
  • Example: [("in", ["red", "green", "blue"])] for valid categories

Attributes:

Name Type Description
rules

List of rule tuples (operator, value).

feature_type

Type of feature ('numerical' or 'categorical').

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Numerical rules
layer = BusinessRulesLayer(rules=[(">", 0), ("<", 100)], feature_type="numerical")
outputs = layer(tf.constant([[50.0], [-10.0]]))
print(outputs['business_anomaly'])  # [[False], [True]]

# Categorical rules
layer = BusinessRulesLayer(
    rules=[("in", ["red", "green"])],
    feature_type="categorical"
)
outputs = layer(tf.constant([["red"], ["blue"]]))
print(outputs['business_anomaly'])  # [[False], [True]]

Initializes the layer.

Parameters:

Name Type Description Default
rules list[Rule]

List of rule tuples (operator, value).

required
feature_type str

Type of feature ('numerical' or 'categorical').

required
trainable_weights bool

Whether to use trainable weights for soft rule enforcement. Default is True.

True
weight_initializer str | Initializer

Initializer for rule weights. Default is 'ones'.

'ones'
name str | None

Optional name for the layer.

None
**kwargs Any

Additional layer arguments.

{}

Raises:

Type Description
ValueError

If feature_type is invalid or rules have invalid operators.

Source code in kerasfactory/layers/BusinessRulesLayer.py
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def __init__(
    self,
    rules: list[Rule],
    feature_type: str,
    trainable_weights: bool = True,
    weight_initializer: str | initializers.Initializer = "ones",
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initializes the layer.

    Args:
        rules: List of rule tuples (operator, value).
        feature_type: Type of feature ('numerical' or 'categorical').
        trainable_weights: Whether to use trainable weights for soft rule enforcement.
            Default is True.
        weight_initializer: Initializer for rule weights. Default is 'ones'.
        name: Optional name for the layer.
        **kwargs: Additional layer arguments.

    Raises:
        ValueError: If feature_type is invalid or rules have invalid operators.
    """
    # Set attributes before calling parent's __init__
    self._rules = rules
    self._feature_type = feature_type
    self._weights_trainable = trainable_weights
    self._weight_initializer = initializers.get(weight_initializer)

    # Validate feature type
    if feature_type not in ["numerical", "categorical"]:
        raise ValueError(
            f"Invalid feature_type: {feature_type}. "
            "Must be 'numerical' or 'categorical'",
        )

    super().__init__(name=name, **kwargs)

    # Set public attributes
    self.rules = self._rules
    self.feature_type = self._feature_type
    self.weights_trainable = self._weights_trainable
    self.weight_initializer = self._weight_initializer
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int | None, int]
) -> dict[str, tuple[int | None, int]]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int | None, int]

Input shape tuple.

required

Returns:

Type Description
dict[str, tuple[int | None, int]]

Dictionary mapping output names to their shapes.

Source code in kerasfactory/layers/BusinessRulesLayer.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def compute_output_shape(
    self,
    input_shape: tuple[int | None, int],
) -> dict[str, tuple[int | None, int]]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Input shape tuple.

    Returns:
        Dictionary mapping output names to their shapes.
    """
    batch_size = input_shape[0]
    return {
        "business_score": (batch_size, 1),
        "business_proba": (batch_size, 1),
        "business_anomaly": (batch_size, 1),
        "business_reason": (batch_size, 1),
        "business_value": input_shape,
    }

🛡️ Regularization & Robustness

🎲 StochasticDepth

Stochastic depth regularization for improved generalization.

kerasfactory.layers.StochasticDepth

Stochastic depth layer for neural networks.

Classes

StochasticDepth
1
2
3
4
5
StochasticDepth(
    survival_prob: float = 0.5,
    seed: int | None = None,
    **kwargs: dict[str, Any]
)

Stochastic depth layer for regularization.

This layer randomly drops entire residual branches with a specified probability during training. During inference, all branches are kept and scaled appropriately. This technique helps reduce overfitting and training time in deep networks.

Reference
Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from keras import random, layers
from kerasfactory.layers import StochasticDepth

# Create sample residual branch
inputs = random.normal((32, 64, 64, 128))
residual = layers.Conv2D(128, 3, padding="same")(inputs)
residual = layers.BatchNormalization()(residual)
residual = layers.ReLU()(residual)

# Apply stochastic depth
outputs = StochasticDepth(survival_prob=0.8)([inputs, residual])

Initialize stochastic depth.

Parameters:

Name Type Description Default
survival_prob float

Probability of keeping the residual branch (default: 0.5)

0.5
seed int | None

Random seed for reproducibility

None
**kwargs dict[str, Any]

Additional layer arguments

{}

Raises:

Type Description
ValueError

If survival_prob is not in [0, 1]

Source code in kerasfactory/layers/StochasticDepth.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def __init__(
    self,
    survival_prob: float = 0.5,
    seed: int | None = None,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize stochastic depth.

    Args:
        survival_prob: Probability of keeping the residual branch (default: 0.5)
        seed: Random seed for reproducibility
        **kwargs: Additional layer arguments

    Raises:
        ValueError: If survival_prob is not in [0, 1]
    """
    super().__init__(**kwargs)

    if not 0 <= survival_prob <= 1:
        raise ValueError(f"survival_prob must be in [0, 1], got {survival_prob}")

    self.survival_prob = survival_prob
    self.seed = seed

    # Create random generator with fixed seed
    self._rng = random.SeedGenerator(seed) if seed else None
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: list[tuple[int, ...]]
) -> tuple[int, ...]

Compute output shape.

Parameters:

Name Type Description Default
input_shape list[tuple[int, ...]]

List of input shape tuples

required

Returns:

Type Description
tuple[int, ...]

Output shape tuple

Source code in kerasfactory/layers/StochasticDepth.py
100
101
102
103
104
105
106
107
108
109
110
111
112
def compute_output_shape(
    self,
    input_shape: list[tuple[int, ...]],
) -> tuple[int, ...]:
    """Compute output shape.

    Args:
        input_shape: List of input shape tuples

    Returns:
        Output shape tuple
    """
    return input_shape[0]
from_config classmethod
1
from_config(config: dict[str, Any]) -> StochasticDepth

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
StochasticDepth

StochasticDepth instance

Source code in kerasfactory/layers/StochasticDepth.py
129
130
131
132
133
134
135
136
137
138
139
@classmethod
def from_config(cls, config: dict[str, Any]) -> "StochasticDepth":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        StochasticDepth instance
    """
    return cls(**config)

🗑️ FeatureCutout

Feature cutout regularization for dropout-like effects on features.

kerasfactory.layers.FeatureCutout

Feature cutout regularization layer for neural networks.

Classes

FeatureCutout
1
2
3
4
5
6
FeatureCutout(
    cutout_prob: float = 0.1,
    noise_value: float = 0.0,
    seed: int | None = None,
    **kwargs: dict[str, Any]
)

Feature cutout regularization layer.

This layer randomly masks out (sets to zero) a specified fraction of features during training to improve model robustness and prevent overfitting. During inference, all features are kept intact.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from keras import random
from kerasfactory.layers import FeatureCutout

# Create sample data
batch_size = 32
feature_dim = 10
inputs = random.normal((batch_size, feature_dim))

# Apply feature cutout
cutout = FeatureCutout(cutout_prob=0.2)
masked_outputs = cutout(inputs, training=True)

Initialize feature cutout.

Parameters:

Name Type Description Default
cutout_prob float

Probability of masking each feature

0.1
noise_value float

Value to use for masked features (default: 0.0)

0.0
seed int | None

Random seed for reproducibility

None
**kwargs dict[str, Any]

Additional layer arguments

{}

Raises:

Type Description
ValueError

If cutout_prob is not in [0, 1]

Source code in kerasfactory/layers/FeatureCutout.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def __init__(
    self,
    cutout_prob: float = 0.1,
    noise_value: float = 0.0,
    seed: int | None = None,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize feature cutout.

    Args:
        cutout_prob: Probability of masking each feature
        noise_value: Value to use for masked features (default: 0.0)
        seed: Random seed for reproducibility
        **kwargs: Additional layer arguments

    Raises:
        ValueError: If cutout_prob is not in [0, 1]
    """
    super().__init__(**kwargs)

    if not 0 <= cutout_prob <= 1:
        raise ValueError(f"cutout_prob must be in [0, 1], got {cutout_prob}")

    self.cutout_prob = cutout_prob
    self.noise_value = noise_value
    self.seed = seed

    # Create random generator with fixed seed
    self._rng = random.SeedGenerator(seed) if seed else None
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute output shape.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Input shape tuple

required

Returns:

Type Description
tuple[int, ...]

Output shape tuple

Source code in kerasfactory/layers/FeatureCutout.py
106
107
108
109
110
111
112
113
114
115
116
117
118
def compute_output_shape(
    self,
    input_shape: tuple[int, ...],
) -> tuple[int, ...]:
    """Compute output shape.

    Args:
        input_shape: Input shape tuple

    Returns:
        Output shape tuple
    """
    return input_shape
from_config classmethod
1
from_config(config: dict[str, Any]) -> FeatureCutout

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
FeatureCutout

FeatureCutout instance

Source code in kerasfactory/layers/FeatureCutout.py
136
137
138
139
140
141
142
143
144
145
146
@classmethod
def from_config(cls, config: dict[str, Any]) -> "FeatureCutout":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        FeatureCutout instance
    """
    return cls(**config)

🎯 SparseAttentionWeighting

Sparse attention weighting for computational efficiency.

kerasfactory.layers.SparseAttentionWeighting

Classes

SparseAttentionWeighting
1
2
3
4
5
SparseAttentionWeighting(
    num_modules: int,
    temperature: float = 1.0,
    **kwargs: dict[str, Any]
)

Sparse attention mechanism with temperature scaling for module outputs combination.

This layer implements a learnable attention mechanism that combines outputs from multiple modules using temperature-scaled attention weights. The attention weights are learned during training and can be made more or less sparse by adjusting the temperature parameter. A higher temperature leads to more uniform weights, while a lower temperature makes the weights more concentrated on specific modules.

Key features: 1. Learnable module importance weights 2. Temperature-controlled sparsity 3. Softmax-based attention mechanism 4. Support for variable number of input features per module

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np
from keras import layers, Model
from kerasfactory.layers import SparseAttentionWeighting

# Create sample module outputs
batch_size = 32
num_modules = 3
feature_dim = 64

# Create three different module outputs
module1 = layers.Dense(feature_dim)(inputs)
module2 = layers.Dense(feature_dim)(inputs)
module3 = layers.Dense(feature_dim)(inputs)

# Combine module outputs using sparse attention
attention = SparseAttentionWeighting(
    num_modules=num_modules,
    temperature=0.5  # Lower temperature for sharper attention
)
combined_output = attention([module1, module2, module3])

# The layer will learn which modules are most important
# and weight their outputs accordingly

Parameters:

Name Type Description Default
num_modules int

Number of input modules whose outputs will be combined.

required
temperature float

Temperature parameter for softmax scaling. Default is 1.0. - temperature > 1.0: More uniform attention weights - temperature < 1.0: More sparse attention weights - temperature = 1.0: Standard softmax behavior

1.0

Initialize sparse attention weighting layer.

Parameters:

Name Type Description Default
num_modules int

Number of input modules to weight. Must be positive.

required
temperature float

Temperature parameter for softmax scaling. Must be positive. Controls the sparsity of attention weights: - Higher values (>1.0) lead to more uniform weights - Lower values (<1.0) lead to more concentrated weights

1.0
**kwargs dict[str, Any]

Additional layer arguments passed to the parent Layer class.

{}

Raises:

Type Description
ValueError

If num_modules <= 0 or temperature <= 0

Source code in kerasfactory/layers/SparseAttentionWeighting.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def __init__(
    self,
    num_modules: int,
    temperature: float = 1.0,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize sparse attention weighting layer.

    Args:
        num_modules: Number of input modules to weight. Must be positive.
        temperature: Temperature parameter for softmax scaling. Must be positive.
            Controls the sparsity of attention weights:
            - Higher values (>1.0) lead to more uniform weights
            - Lower values (<1.0) lead to more concentrated weights
        **kwargs: Additional layer arguments passed to the parent Layer class.

    Raises:
        ValueError: If num_modules <= 0 or temperature <= 0
    """
    if num_modules <= 0:
        raise ValueError(f"num_modules must be positive, got {num_modules}")
    if temperature <= 0:
        raise ValueError(f"temperature must be positive, got {temperature}")

    super().__init__(**kwargs)
    self.num_modules = num_modules
    self.temperature = temperature

    # Learnable attention weights
    self.attention_weights = self.add_weight(
        shape=(num_modules,),
        initializer="ones",
        trainable=True,
        name="attention_weights",
    )
Functions
from_config classmethod
1
2
3
from_config(
    config: dict[str, Any]
) -> SparseAttentionWeighting

Create layer from configuration.

Parameters:

Name Type Description Default
config dict[str, Any]

Layer configuration dictionary

required

Returns:

Type Description
SparseAttentionWeighting

SparseAttentionWeighting instance

Source code in kerasfactory/layers/SparseAttentionWeighting.py
148
149
150
151
152
153
154
155
156
157
158
@classmethod
def from_config(cls, config: dict[str, Any]) -> "SparseAttentionWeighting":
    """Create layer from configuration.

    Args:
        config: Layer configuration dictionary

    Returns:
        SparseAttentionWeighting instance
    """
    return cls(**config)

🔧 Specialized Processing

🐢 SlowNetwork

Slow network layer for temporal smoothing and stability.

kerasfactory.layers.SlowNetwork

This module implements a SlowNetwork layer that processes features through multiple dense layers. It's designed to be used as a component in more complex architectures.

Classes

SlowNetwork
1
2
3
4
5
6
7
SlowNetwork(
    input_dim: int,
    num_layers: int = 3,
    units: int = 128,
    name: str | None = None,
    **kwargs: Any
)

A multi-layer network with configurable depth and width.

This layer processes input features through multiple dense layers with ReLU activations, and projects the output back to the original feature dimension.

Parameters:

Name Type Description Default
input_dim int

Dimension of the input features.

required
num_layers int

Number of hidden layers. Default is 3.

3
units int

Number of units per hidden layer. Default is 128.

128
name str | None

Optional name for the layer.

None
Input shape

2D tensor with shape: (batch_size, input_dim)

Output shape

2D tensor with shape: (batch_size, input_dim) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import keras
from kerasfactory.layers import SlowNetwork

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Create the layer
slow_net = SlowNetwork(input_dim=16, num_layers=3, units=64)
y = slow_net(x)
print("Output shape:", y.shape)  # (32, 16)

Initialize the SlowNetwork layer.

Parameters:

Name Type Description Default
input_dim int

Input dimension.

required
num_layers int

Number of hidden layers.

3
units int

Number of units in each layer.

128
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/SlowNetwork.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def __init__(
    self,
    input_dim: int,
    num_layers: int = 3,
    units: int = 128,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the SlowNetwork layer.

    Args:
        input_dim: Input dimension.
        num_layers: Number of hidden layers.
        units: Number of units in each layer.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.input_dim = input_dim
    self.num_layers = num_layers
    self.units = units

    # Initialize instance variables
    self.hidden_layers: list[Any] | None = None
    self.output_layer: Any | None = None

    # Validate parameters
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

⚡ HyperZZWOperator

Specialized hyperparameter operator for advanced transformations.

kerasfactory.layers.HyperZZWOperator

This module implements a HyperZZWOperator layer that computes context-dependent weights by multiplying inputs with hyper-kernels. This is a specialized layer for the Terminator model.

Classes

HyperZZWOperator
1
2
3
4
5
6
HyperZZWOperator(
    input_dim: int,
    context_dim: int | None = None,
    name: str | None = None,
    **kwargs: Any
)

A layer that computes context-dependent weights by multiplying inputs with hyper-kernels.

This layer takes two inputs: the original input tensor and a context tensor. It generates hyper-kernels from the context and performs a context-dependent transformation of the input.

Parameters:

Name Type Description Default
input_dim int

Dimension of the input features.

required
context_dim int | None

Optional dimension of the context features. If not provided, it will be inferred.

None
name str | None

Optional name for the layer.

None
Input

A list of two tensors: - inputs[0]: Input tensor with shape (batch_size, input_dim). - inputs[1]: Context tensor with shape (batch_size, context_dim).

Output shape

2D tensor with shape: (batch_size, input_dim) (same as input)

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import keras
from kerasfactory.layers import HyperZZWOperator

# Create sample input data
inputs = keras.random.normal((32, 16))  # 32 samples, 16 features
context = keras.random.normal((32, 8))  # 32 samples, 8 context features

# Create the layer
zzw_op = HyperZZWOperator(input_dim=16, context_dim=8)
context_weights = zzw_op([inputs, context])
print("Output shape:", context_weights.shape)  # (32, 16)

Initialize the HyperZZWOperator.

Parameters:

Name Type Description Default
input_dim int

Input dimension.

required
context_dim int | None

Context dimension.

None
name str | None

Name of the layer.

None
**kwargs Any

Additional keyword arguments.

{}
Source code in kerasfactory/layers/HyperZZWOperator.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def __init__(
    self,
    input_dim: int,
    context_dim: int | None = None,
    name: str | None = None,
    **kwargs: Any,
) -> None:
    """Initialize the HyperZZWOperator.

    Args:
        input_dim: Input dimension.
        context_dim: Context dimension.
        name: Name of the layer.
        **kwargs: Additional keyword arguments.
    """
    # Set public attributes
    self.input_dim = input_dim
    self.context_dim = context_dim

    # Validate parameters
    self._validate_params()

    # Call parent's __init__
    super().__init__(name=name, **kwargs)

🚨 Anomaly Detection

📉 NumericalAnomalyDetection

Detects anomalies in numerical features using statistical methods.

kerasfactory.layers.NumericalAnomalyDetection

Classes

NumericalAnomalyDetection
1
2
3
4
5
6
NumericalAnomalyDetection(
    hidden_dims: list[int],
    reconstruction_weight: float = 0.5,
    distribution_weight: float = 0.5,
    **kwargs: dict[str, Any]
)

Numerical anomaly detection layer for identifying outliers in numerical features.

This layer learns a distribution for each numerical feature and outputs an anomaly score for each feature based on how far it deviates from the learned distribution. The layer uses a combination of mean, variance, and autoencoder reconstruction error to detect anomalies.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import tensorflow as tf
from kerasfactory.layers import NumericalAnomalyDetection

# Suppose we have 5 numerical features
x = tf.random.normal((32, 5))  # Batch of 32 samples
# Create a NumericalAnomalyDetection layer
anomaly_layer = NumericalAnomalyDetection(
    hidden_dims=[8, 4],
    reconstruction_weight=0.5,
    distribution_weight=0.5
)
anomaly_scores = anomaly_layer(x)
print("Anomaly scores shape:", anomaly_scores.shape)  # Expected: (32, 5)

Initialize the layer.

Parameters:

Name Type Description Default
hidden_dims list[int]

List of hidden dimensions for the autoencoder.

required
reconstruction_weight float

Weight for reconstruction error in anomaly score.

0.5
distribution_weight float

Weight for distribution-based error in anomaly score.

0.5
**kwargs dict[str, Any]

Additional keyword arguments.

{}
Source code in kerasfactory/layers/NumericalAnomalyDetection.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def __init__(
    self,
    hidden_dims: list[int],
    reconstruction_weight: float = 0.5,
    distribution_weight: float = 0.5,
    **kwargs: dict[str, Any],
) -> None:
    """Initialize the layer.

    Args:
        hidden_dims: List of hidden dimensions for the autoencoder.
        reconstruction_weight: Weight for reconstruction error in anomaly score.
        distribution_weight: Weight for distribution-based error in anomaly score.
        **kwargs: Additional keyword arguments.
    """
    self.hidden_dims = hidden_dims
    self.reconstruction_weight = reconstruction_weight
    self.distribution_weight = distribution_weight
    super().__init__(**kwargs)
Functions
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int, ...]
) -> tuple[int, ...]

Compute output shape.

Parameters:

Name Type Description Default
input_shape tuple[int, ...]

Input shape tuple.

required

Returns:

Type Description
tuple[int, ...]

Output shape tuple.

Source code in kerasfactory/layers/NumericalAnomalyDetection.py
134
135
136
137
138
139
140
141
142
143
def compute_output_shape(self, input_shape: tuple[int, ...]) -> tuple[int, ...]:
    """Compute output shape.

    Args:
        input_shape: Input shape tuple.

    Returns:
        Output shape tuple.
    """
    return input_shape

📊 CategoricalAnomalyDetectionLayer

Detects anomalies in categorical features.

kerasfactory.layers.CategoricalAnomalyDetectionLayer

Classes

CategoricalAnomalyDetectionLayer
1
2
3
CategoricalAnomalyDetectionLayer(
    dtype: str = "string", **kwargs
)

Backend-agnostic anomaly detection for categorical features.

This layer detects anomalies in categorical features by checking if values belong to a predefined set of valid categories. Values not in this set are considered anomalous.

The layer uses a Keras StringLookup or IntegerLookup layer internally to efficiently map input values to indices, which are then used to determine if a value is valid.

Attributes:

Name Type Description
dtype Any

The data type of input values ('string' or 'int32').

lookup StringLookup | IntegerLookup | None

A Keras lookup layer for mapping values to indices.

vocabulary StringLookup | IntegerLookup | None

list of valid categorical values.

Example
1
2
3
4
layer = CategoricalAnomalyDetectionLayer(dtype='string')
layer.initialize_from_stats(vocabulary=['red', 'green', 'blue'])
outputs = layer(tf.constant([['red'], ['purple']]))
print(outputs['anomaly'])  # [[False], [True]]

Initializes the layer.

Parameters:

Name Type Description Default
dtype str

Data type of input values ('string' or 'int32'). Defaults to 'string'.

'string'
**kwargs

Additional layer arguments.

{}

Raises:

Type Description
ValueError

If dtype is not 'string' or 'int32'.

Source code in kerasfactory/layers/CategoricalAnomalyDetectionLayer.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def __init__(self, dtype: str = "string", **kwargs) -> None:
    """Initializes the layer.

    Args:
        dtype: Data type of input values ('string' or 'int32'). Defaults to 'string'.
        **kwargs: Additional layer arguments.

    Raises:
        ValueError: If dtype is not 'string' or 'int32'.
    """
    self._dtype = None  # Initialize private attribute
    self.lookup: layers.StringLookup | layers.IntegerLookup | None = None
    self.built = False
    super().__init__(**kwargs)
    self.set_dtype(dtype.lower())  # Use setter method
Attributes
dtype property
1
dtype: Any

Get the dtype of the layer.

Functions
set_dtype
1
set_dtype(value) -> None

Set the dtype and initialize the appropriate lookup layer.

Source code in kerasfactory/layers/CategoricalAnomalyDetectionLayer.py
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def set_dtype(self, value) -> None:
    """Set the dtype and initialize the appropriate lookup layer."""
    self._dtype = value
    if self._dtype == "string":
        self.lookup = layers.StringLookup(
            output_mode="int",
            num_oov_indices=1,
            name="string_lookup",
        )
    elif self._dtype == "int":
        self.lookup = layers.IntegerLookup(
            output_mode="int",
            num_oov_indices=1,
            name="int_lookup",
        )
    else:
        raise ValueError(f"Unsupported dtype: {value}")
initialize_from_stats
1
initialize_from_stats(vocabulary: list[str | int]) -> None

Initializes the layer with a vocabulary of valid values.

Parameters:

Name Type Description Default
vocabulary list[str | int]

list of valid categorical values.

required
Source code in kerasfactory/layers/CategoricalAnomalyDetectionLayer.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def initialize_from_stats(self, vocabulary: list[str | int]) -> None:
    """Initializes the layer with a vocabulary of valid values.

    Args:
        vocabulary: list of valid categorical values.
    """
    # Convert vocabulary to numpy array
    # For empty vocabulary, add a dummy value that will never match
    vocab_array = (
        np.array(["__EMPTY_VOCABULARY__"])
        if not vocabulary
        else np.array(vocabulary)
    )

    # Initialize the lookup layer with the vocabulary
    self.lookup.adapt(vocab_array.reshape(-1, 1))
    logger.info("Categorical layer initialized with vocabulary: {}", vocabulary)
compute_output_shape
1
2
3
compute_output_shape(
    input_shape: tuple[int | None, int]
) -> dict[str, tuple[int | None, int]]

Compute the output shape of the layer.

Parameters:

Name Type Description Default
input_shape tuple[int | None, int]

Input shape tuple.

required

Returns:

Type Description
dict[str, tuple[int | None, int]]

Dictionary mapping output names to their shapes.

Source code in kerasfactory/layers/CategoricalAnomalyDetectionLayer.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def compute_output_shape(
    self,
    input_shape: tuple[int | None, int],
) -> dict[str, tuple[int | None, int]]:
    """Compute the output shape of the layer.

    Args:
        input_shape: Input shape tuple.

    Returns:
        Dictionary mapping output names to their shapes.
    """
    batch_size = input_shape[0]
    return {
        "score": (batch_size, 1),
        "proba": (batch_size, 1),
        "threshold": (1, 1),
        "anomaly": (batch_size, 1),
        "reason": (batch_size, 1),
        "value": input_shape,
    }
from_config classmethod
1
from_config(config) -> Any

Create layer from configuration.

Source code in kerasfactory/layers/CategoricalAnomalyDetectionLayer.py
202
203
204
205
206
207
208
209
210
211
212
@classmethod
def from_config(cls, config) -> Any:
    """Create layer from configuration."""
    # Get vocabulary from config
    vocabulary = config.pop("vocabulary", [])
    # Create layer instance
    layer = cls(**config)
    # Initialize vocabulary
    if vocabulary:
        layer.initialize_from_stats(vocabulary)
    return layer