🧩 Layers API Reference

Welcome to the KMR Layers documentation! All layers are designed to work exclusively with Keras 3 and follow consistent patterns for easy integration.

Quick Navigation

🎯 Most Popular: Start with TabularAttention, DistributionTransformLayer, GatedFeatureFusion
🔧 Feature Engineering: DateEncodingLayer, VariableSelection, BusinessRulesLayer
🧠 Advanced: AdvancedNumericalEmbedding, TransformerBlock, StochasticDepth

Keras 3 Native

All layers are built exclusively for Keras 3 with no TensorFlow dependencies in production code.

📚 Layer Categories

🎯 Core Layers🔧 Feature Engineering🧠 Attention Mechanisms🏗️ Specialized Layers

Essential layers for building tabular models with attention mechanisms and feature processing.

Layers for data preprocessing, transformation, and feature engineering tasks.

Advanced attention layers for capturing complex feature relationships.

Specialized layers for specific use cases like anomaly detection and boosting.

🎯 Core Layers

🧠 TabularAttention

Dual attention mechanism for inter-feature and inter-sample relationships in tabular data.

kmr.layers.TabularAttention

This module implements a TabularAttention layer that applies inter-feature and inter-sample attention mechanisms for tabular data. It's particularly useful for capturing complex relationships between features and samples in tabular datasets.

Classes

TabularAttention

TabularAttention(
    num_heads,
    d_model,
    dropout_rate=0.1,
    name=None,
    **kwargs
)

Custom layer to apply inter-feature and inter-sample attention for tabular data.

This layer implements a dual attention mechanism: 1. Inter-feature attention: Captures dependencies between features for each sample 2. Inter-sample attention: Captures dependencies between samples for each feature

The layer uses MultiHeadAttention for both attention mechanisms and includes layer normalization, dropout, and a feed-forward network.

Parameters:

Name	Type	Description	Default
`num_heads`	`int`	Number of attention heads	required
`d_model`	`int`	Dimensionality of the attention model	required
`dropout_rate`	`float`	Dropout rate for regularization	`0.1`
`name`	`str`	Name for the layer	`None`

Input shape

Tensor with shape: (batch_size, num_samples, num_features)

Output shape

Tensor with shape: (batch_size, num_samples, d_model)

Example

import keras
from kmr.layers import TabularAttention

# Create sample input data
x = keras.random.normal((32, 100, 20))  # 32 batches, 100 samples, 20 features

# Apply tabular attention
attention = TabularAttention(num_heads=4, d_model=32, dropout_rate=0.1)
y = attention(x)
print("Output shape:", y.shape)  # (32, 100, 32)

Initialize the TabularAttention layer.

Parameters:

Name	Type	Description	Default
`num_heads`	`int`	Number of attention heads.	required
`d_model`	`int`	Model dimension.	required
`dropout_rate`	`float`	Dropout rate.	`0.1`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	Shape of the input tensor.	required

Returns:

Type	Description
`tuple[int, ...]`	Shape of the output tensor.

🔢 AdvancedNumericalEmbedding

Advanced numerical feature embedding with dual-branch architecture (continuous + discrete).

kmr.layers.AdvancedNumericalEmbedding

This module implements an AdvancedNumericalEmbedding layer that embeds continuous numerical features into a higher-dimensional space using a combination of continuous and discrete branches.

Classes

AdvancedNumericalEmbedding

AdvancedNumericalEmbedding(
    embedding_dim=8,
    mlp_hidden_units=16,
    num_bins=10,
    init_min=-3.0,
    init_max=3.0,
    dropout_rate=0.1,
    use_batch_norm=True,
    name=None,
    **kwargs
)

Advanced numerical embedding layer for continuous features.

This layer embeds each continuous numerical feature into a higher-dimensional space by combining two branches:

Continuous Branch: Each feature is processed via a small MLP.
Discrete Branch: Each feature is discretized into bins using learnable min/max boundaries and then an embedding is looked up for its bin.

A learnable gate combines the two branch outputs per feature and per embedding dimension. Additionally, the continuous branch uses a residual connection and optional batch normalization to improve training stability.

Parameters:

Name	Type	Description	Default
`embedding_dim`	`int`	Output embedding dimension per feature.	`8`
`mlp_hidden_units`	`int`	Hidden units for the continuous branch MLP.	`16`
`num_bins`	`int`	Number of bins for discretization.	`10`
`init_min`	`float or list`	Initial minimum values for discretization boundaries. If a scalar is provided, it is applied to all features.	`-3.0`
`init_max`	`float or list`	Initial maximum values for discretization boundaries.	`3.0`
`dropout_rate`	`float`	Dropout rate applied to the continuous branch.	`0.1`
`use_batch_norm`	`bool`	Whether to apply batch normalization to the continuous branch.	`True`
`name`	`str`	Name for the layer.	`None`

Input shape

Tensor with shape: (batch_size, num_features)

Output shape

Tensor with shape: (batch_size, num_features, embedding_dim) or (batch_size, embedding_dim) if num_features=1

Example

import keras
from kmr.layers import AdvancedNumericalEmbedding

# Create sample input data
x = keras.random.normal((32, 5))  # 32 samples, 5 features

# Create the layer
embedding = AdvancedNumericalEmbedding(
    embedding_dim=8,
    mlp_hidden_units=16,
    num_bins=10
)
y = embedding(x)
print("Output shape:", y.shape)  # (32, 5, 8)

Initialize the AdvancedNumericalEmbedding layer.

Parameters:

Name	Type	Description	Default
`embedding_dim`	`int`	Embedding dimension.	`8`
`mlp_hidden_units`	`int`	Hidden units in MLP.	`16`
`num_bins`	`int`	Number of bins for discretization.	`10`
`init_min`	`float \| list[float]`	Minimum initialization value.	`-3.0`
`init_max`	`float \| list[float]`	Maximum initialization value.	`3.0`
`dropout_rate`	`float`	Dropout rate.	`0.1`
`use_batch_norm`	`bool`	Whether to use batch normalization.	`True`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	Shape of the input tensor.	required

Returns:

Type	Description
`tuple[int, ...]`	Shape of the output tensor.

🔀 GatedFeatureFusion

Gated mechanism for intelligently fusing multiple feature representations.

kmr.layers.GatedFeatureFusion

This module implements a GatedFeatureFusion layer that combines two feature representations through a learned gating mechanism. It's particularly useful for tabular datasets with multiple representations (e.g., raw numeric features alongside embeddings).

Classes

GatedFeatureFusion

GatedFeatureFusion(
    activation="sigmoid", name=None, **kwargs
)

Gated feature fusion layer for combining two feature representations.

This layer takes two inputs (e.g., numerical features and their embeddings) and fuses them using a learned gate to balance their contributions. The gate is computed using a dense layer with sigmoid activation, applied to the concatenation of both inputs.

Parameters:

Name	Type	Description	Default
`activation`	`str`	Activation function to use for the gate. Default is 'sigmoid'.	`'sigmoid'`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

A list of 2 tensors with shape: [(batch_size, ..., features), (batch_size, ..., features)] Both inputs must have the same shape.

Output shape

Tensor with shape: (batch_size, ..., features), same as each input.

Example

import keras
from kmr.layers import GatedFeatureFusion

# Two representations for the same 10 features
feat1 = keras.random.normal((32, 10))
feat2 = keras.random.normal((32, 10))

fusion_layer = GatedFeatureFusion()
fused = fusion_layer([feat1, feat2])
print("Fused output shape:", fused.shape)  # Expected: (32, 10)

Initialize the GatedFeatureFusion layer.

Parameters:

Name	Type	Description	Default
`activation`	`str`	Activation function for the gate.	`'sigmoid'`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🎯 VariableSelection

Intelligent variable selection network for identifying important features.

kmr.layers.VariableSelection

This module implements a VariableSelection layer that applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It's particularly useful for dynamic feature selection in time series and tabular models.

Classes

VariableSelection

VariableSelection(
    nr_features,
    units,
    dropout_rate=0.1,
    use_context=False,
    name=None,
    **kwargs
)

Layer for dynamic feature selection using gated residual networks.

This layer applies a gated residual network to each feature independently and learns feature weights through a softmax layer. It can optionally use a context vector to condition the feature selection.

Parameters:

Name	Type	Description	Default
`nr_features`	`int`	Number of input features	required
`units`	`int`	Number of hidden units in the gated residual network	required
`dropout_rate`	`float`	Dropout rate for regularization	`0.1`
`use_context`	`bool`	Whether to use a context vector for conditioning	`False`
`name`	`str`	Name for the layer	`None`

Input shape

If use_context is False: - Single tensor with shape: (batch_size, nr_features, feature_dim) If use_context is True: - List of two tensors: - Features tensor with shape: (batch_size, nr_features, feature_dim) - Context tensor with shape: (batch_size, context_dim)

Output shape

Tuple of two tensors: - Selected features: (batch_size, feature_dim) - Feature weights: (batch_size, nr_features)

Example

import keras
from kmr.layers import VariableSelection

# Create sample input data
x = keras.random.normal((32, 10, 16))  # 32 batches, 10 features, 16 dims per feature

# Without context
vs = VariableSelection(nr_features=10, units=32, dropout_rate=0.1)
selected, weights = vs(x)
print("Selected features shape:", selected.shape)  # (32, 16)
print("Feature weights shape:", weights.shape)  # (32, 10)

# With context
context = keras.random.normal((32, 64))  # 32 batches, 64-dim context
vs_context = VariableSelection(nr_features=10, units=32, dropout_rate=0.1, use_context=True)
selected, weights = vs_context([x, context])

Initialize the VariableSelection layer.

Parameters:

Name	Type	Description	Default
`nr_features`	`int`	Number of input features.	required
`units`	`int`	Number of units in the selection network.	required
`dropout_rate`	`float`	Dropout rate.	`0.1`
`use_context`	`bool`	Whether to use context for selection.	`False`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...] \| list[tuple[int, ...]]`	Shape of the input tensor or list of shapes if using context.	required

Returns:

Type	Description
`list[tuple[int, ...]]`	List of shapes for the output tensors.

🔄 TransformerBlock

Standard transformer block with multi-head attention and feed-forward networks.

kmr.layers.TransformerBlock

This module implements a TransformerBlock layer that applies transformer-style self-attention and feed-forward processing to input tensors. It's particularly useful for capturing complex relationships in tabular data.

Classes

TransformerBlock

TransformerBlock(
    dim_model=32,
    num_heads=3,
    ff_units=16,
    dropout_rate=0.2,
    name=None,
    **kwargs
)

Transformer block with multi-head attention and feed-forward layers.

This layer implements a standard transformer block with multi-head self-attention followed by a feed-forward network, with residual connections and layer normalization.

Parameters:

Name	Type	Description	Default
`dim_model`	`int`	Dimensionality of the model.	`32`
`num_heads`	`int`	Number of attention heads.	`3`
`ff_units`	`int`	Number of units in the feed-forward network.	`16`
`dropout_rate`	`float`	Dropout rate for regularization.	`0.2`
`name`	`str`	Name for the layer.	`None`

Input shape

Tensor with shape: (batch_size, sequence_length, dim_model) or (batch_size, dim_model) which will be automatically reshaped.

Output shape

Tensor with shape: (batch_size, sequence_length, dim_model) or (batch_size, dim_model) matching the input shape.

Example

import keras
from kmr.layers import TransformerBlock

# Create sample input data
x = keras.random.normal((32, 10, 64))  # 32 samples, 10 time steps, 64 features

# Apply transformer block
transformer = TransformerBlock(dim_model=64, num_heads=4, ff_units=128, dropout_rate=0.1)
y = transformer(x)
print("Output shape:", y.shape)  # (32, 10, 64)

Initialize the TransformerBlock layer.

Parameters:

Name	Type	Description	Default
`dim_model`	`int`	Model dimension.	`32`
`num_heads`	`int`	Number of attention heads.	`3`
`ff_units`	`int`	Feed-forward units.	`16`
`dropout_rate`	`float`	Dropout rate.	`0.2`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	Shape of the input tensor.	required

Returns:

Type	Description
`tuple[int, ...]`	Shape of the output tensor.

🎲 StochasticDepth

Stochastic depth regularization for improved training and generalization.

kmr.layers.StochasticDepth

Stochastic depth layer for neural networks.

Classes

StochasticDepth

StochasticDepth(survival_prob=0.5, seed=None, **kwargs)

Stochastic depth layer for regularization.

This layer randomly drops entire residual branches with a specified probability during training. During inference, all branches are kept and scaled appropriately. This technique helps reduce overfitting and training time in deep networks.

Reference

Deep Networks with Stochastic Depth

Example

from keras import random, layers
from kmr.layers import StochasticDepth

# Create sample residual branch
inputs = random.normal((32, 64, 64, 128))
residual = layers.Conv2D(128, 3, padding="same")(inputs)
residual = layers.BatchNormalization()(residual)
residual = layers.ReLU()(residual)

# Apply stochastic depth
outputs = StochasticDepth(survival_prob=0.8)([inputs, residual])

Initialize stochastic depth.

Parameters:

Name	Type	Description	Default
`survival_prob`	`float`	Probability of keeping the residual branch (default: 0.5)	`0.5`
`seed`	`int \| None`	Random seed for reproducibility	`None`
`**kwargs`	`dict[str, Any]`	Additional layer arguments	`{}`

Raises:

Type	Description
`ValueError`	If survival_prob is not in [0, 1]

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute output shape.

Parameters:

Name	Type	Description	Default
`input_shape`	`list[tuple[int, ...]]`	List of input shape tuples	required

Returns:

Type	Description
`tuple[int, ...]`	Output shape tuple

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`StochasticDepth`	StochasticDepth instance

🔧 Feature Engineering Layers

📊 DistributionTransformLayer

Automatic distribution transformation for numerical features to improve model performance.

kmr.layers.DistributionTransformLayer

This module implements a DistributionTransformLayer that applies various transformations to make data more normally distributed or to handle specific distribution types better. It's particularly useful for preprocessing data before anomaly detection or other statistical analyses.

Classes

DistributionTransformLayer

DistributionTransformLayer(
    transform_type="none",
    lambda_param=0.0,
    epsilon=1e-10,
    min_value=0.0,
    max_value=1.0,
    clip_values=True,
    auto_candidates=None,
    name=None,
    **kwargs
)

Layer for transforming data distributions to improve anomaly detection.

This layer applies various transformations to make data more normally distributed or to handle specific distribution types better. Supported transformations include log, square root, Box-Cox, Yeo-Johnson, arcsinh, cube-root, logit, quantile, robust-scale, and min-max.

When transform_type is set to 'auto', the layer automatically selects the most appropriate transformation based on the data characteristics during training.

Parameters:

Name	Type	Description	Default
`transform_type`	`str`	Type of transformation to apply. Options are 'none', 'log', 'sqrt', 'box-cox', 'yeo-johnson', 'arcsinh', 'cube-root', 'logit', 'quantile', 'robust-scale', 'min-max', or 'auto'. Default is 'none'.	`'none'`
`lambda_param`	`float`	Parameter for parameterized transformations like Box-Cox and Yeo-Johnson. Default is 0.0.	`0.0`
`epsilon`	`float`	Small value added to prevent numerical issues like log(0). Default is 1e-10.	`1e-10`
`min_value`	`float`	Minimum value for min-max scaling. Default is 0.0.	`0.0`
`max_value`	`float`	Maximum value for min-max scaling. Default is 1.0.	`1.0`
`clip_values`	`bool`	Whether to clip values to the specified range in min-max scaling. Default is True.	`True`
`auto_candidates`	`list[str] \| None`	list of transformation types to consider when transform_type is 'auto'. If None, all available transformations will be considered. Default is None.	`None`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

N-D tensor with shape: (batch_size, ..., features)

Output shape

Same shape as input: (batch_size, ..., features)

Example

import keras
import numpy as np
from kmr.layers import DistributionTransformLayer

# Create sample input data with skewed distribution
x = keras.random.exponential((32, 10))  # 32 samples, 10 features

# Apply log transformation
log_transform = DistributionTransformLayer(transform_type="log")
y = log_transform(x)
print("Transformed output shape:", y.shape)  # (32, 10)

# Apply Box-Cox transformation with lambda=0.5
box_cox = DistributionTransformLayer(transform_type="box-cox", lambda_param=0.5)
z = box_cox(x)

# Apply arcsinh transformation (handles both positive and negative values)
arcsinh_transform = DistributionTransformLayer(transform_type="arcsinh")
a = arcsinh_transform(x)

# Apply min-max scaling to range [0, 1]
min_max = DistributionTransformLayer(transform_type="min-max", min_value=0.0, max_value=1.0)
b = min_max(x)

# Use automatic transformation selection
auto_transform = DistributionTransformLayer(transform_type="auto")
c = auto_transform(x)  # Will select the best transformation during training

Initialize the DistributionTransformLayer.

Parameters:

Name	Type	Description	Default
`transform_type`	`str`	Type of transformation to apply.	`'none'`
`lambda_param`	`float`	Lambda parameter for Box-Cox transformation.	`0.0`
`epsilon`	`float`	Small value to avoid division by zero.	`1e-10`
`min_value`	`float`	Minimum value for clipping.	`0.0`
`max_value`	`float`	Maximum value for clipping.	`1.0`
`clip_values`	`bool`	Whether to clip values.	`True`
`auto_candidates`	`list[str] \| None`	List of candidate transformations for auto mode.	`None`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

📅 DateEncodingLayer

Comprehensive date and time feature encoding with multiple temporal representations.

kmr.layers.DateEncodingLayer

DateEncodingLayer for encoding date components into cyclical features.

This layer takes date components (year, month, day, day of week) and encodes them into cyclical features using sine and cosine transformations.

Classes

DateEncodingLayer

DateEncodingLayer(min_year=1900, max_year=2100, **kwargs)

Layer for encoding date components into cyclical features.

This layer takes date components (year, month, day, day of week) and encodes them into cyclical features using sine and cosine transformations. The year is normalized to a range between 0 and 1 based on min_year and max_year.

Parameters:

Name	Type	Description	Default
`min_year`	`int`	Minimum year for normalization (default: 1900)	`1900`
`max_year`	`int`	Maximum year for normalization (default: 2100)	`2100`
`**kwargs`		Additional layer arguments	`{}`

Input shape

Tensor with shape: (..., 4) containing [year, month, day, day_of_week]

Output shape

Tensor with shape: (..., 8) containing cyclical encodings: [year_sin, year_cos, month_sin, month_cos, day_sin, day_cos, dow_sin, dow_cos]

Initialize the layer.

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`		Shape of the input tensor	required

Returns:

Type	Description
`tuple[int, ...]`	Output shape

🔍 DateParsingLayer

Flexible date parsing and extraction from various date formats and strings.

kmr.layers.DateParsingLayer

Date Parsing Layer for Keras 3.

This module provides a layer for parsing date strings into numerical components.

Classes

DateParsingLayer

DateParsingLayer(date_format='YYYY-MM-DD', **kwargs)

Layer for parsing date strings into numerical components.

This layer takes date strings in a specified format and returns a tensor containing the year, month, day of the month, and day of the week.

Parameters:

Name	Type	Description	Default
`date_format`	`str`	Format of the date strings. Currently supports 'YYYY-MM-DD' and 'YYYY/MM/DD'. Default is 'YYYY-MM-DD'.	`'YYYY-MM-DD'`
`**kwargs`		Additional keyword arguments to pass to the base layer.	`{}`

Input shape

String tensor of any shape.

Output shape

Same as input shape with an additional dimension of size 4 appended. For example, if input shape is [batch_size], output shape will be [batch_size, 4].

Initialize the layer.

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`		Shape of the input tensor.	required

Returns:

Type	Description
`tuple[int, ...]`	Shape of the output tensor.

🌸 SeasonLayer

Seasonal feature extraction from date/time data for temporal pattern recognition.

kmr.layers.SeasonLayer

SeasonLayer for adding seasonal information based on month.

This layer adds seasonal information based on the month, encoding it as a one-hot vector for the four seasons: Winter, Spring, Summer, and Fall.

Classes

SeasonLayer

SeasonLayer(**kwargs)

Layer for adding seasonal information based on month.

This layer adds seasonal information based on the month, encoding it as a one-hot vector for the four seasons: Winter, Spring, Summer, and Fall.

Parameters:

Name	Type	Description	Default
`**kwargs`		Additional layer arguments	`{}`

Input shape

Tensor with shape: (..., 4) containing [year, month, day, day_of_week]

Output shape

Tensor with shape: (..., 8) containing the original 4 components plus 4 one-hot encoded season values

Initialize the layer.

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`		Shape of the input tensor	required

Returns:

Type	Description
`tuple[tuple[int, ...], tuple[int, ...]]`	Output shape

🧠 Attention Mechanisms

📊 ColumnAttention

Column-wise attention for tabular data to capture feature-level relationships.

kmr.layers.ColumnAttention

Column attention mechanism for weighting features dynamically.

Classes

ColumnAttention

ColumnAttention(input_dim, hidden_dim=None, **kwargs)

Column attention mechanism to weight features dynamically.

This layer applies attention weights to each feature (column) in the input tensor. The attention weights are computed using a two-layer neural network that takes the input features and outputs attention weights for each feature.

Example

import tensorflow as tf
from kmr.layers import ColumnAttention

# Create sample data
batch_size = 32
input_dim = 10
inputs = tf.random.normal((batch_size, input_dim))

# Apply column attention
attention = ColumnAttention(input_dim=input_dim)
weighted_outputs = attention(inputs)

Initialize column attention.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Input dimension	required
`hidden_dim`	`int \| None`	Hidden layer dimension. If None, uses input_dim // 2	`None`
`**kwargs`	`dict[str, Any]`	Additional layer arguments	`{}`

Functions

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`ColumnAttention`	ColumnAttention instance

📋 RowAttention

Row-wise attention mechanisms for sample-level pattern recognition.

kmr.layers.RowAttention

Row attention mechanism for weighting samples in a batch.

Classes

RowAttention

RowAttention(feature_dim, hidden_dim=None, **kwargs)

Row attention mechanism to weight samples dynamically.

This layer applies attention weights to each sample (row) in the input tensor. The attention weights are computed using a two-layer neural network that takes each sample as input and outputs a scalar attention weight.

Example

import tensorflow as tf
from kmr.layers import RowAttention

# Create sample data
batch_size = 32
feature_dim = 10
inputs = tf.random.normal((batch_size, feature_dim))

# Apply row attention
attention = RowAttention(feature_dim=feature_dim)
weighted_outputs = attention(inputs)

Initialize row attention.

Parameters:

Name	Type	Description	Default
`feature_dim`	`int`	Number of input features	required
`hidden_dim`	`int \| None`	Hidden layer dimension. If None, uses feature_dim // 2	`None`
`**kwargs`	`dict[str, Any]`	Additional layer arguments	`{}`

Functions

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`RowAttention`	RowAttention instance

🔍 InterpretableMultiHeadAttention

Interpretable multi-head attention with attention weight analysis and visualization.

kmr.layers.InterpretableMultiHeadAttention

Interpretable Multi-Head Attention layer implementation.

Classes

InterpretableMultiHeadAttention

InterpretableMultiHeadAttention(
    d_model, n_head, dropout_rate=0.1, **kwargs
)

Interpretable Multi-Head Attention layer.

This layer wraps Keras MultiHeadAttention and stores the attention scores for interpretability purposes. The attention scores can be accessed via the attention_scores attribute after calling the layer.

Parameters:

Name	Type	Description	Default
`d_model`	`int`	Size of each attention head for query, key, value.	required
`n_head`	`int`	Number of attention heads.	required
`dropout_rate`	`float`	Dropout probability. Default: 0.1.	`0.1`
`**kwargs`	`dict[str, Any]`	Additional arguments passed to MultiHeadAttention. Supported arguments: - value_dim: Size of each attention head for value. - use_bias: Whether to use bias. Default: True. - output_shape: Expected output shape. Default: None. - attention_axes: Axes for attention. Default: None. - kernel_initializer: Initializer for kernels. Default: 'glorot_uniform'. - bias_initializer: Initializer for biases. Default: 'zeros'. - kernel_regularizer: Regularizer for kernels. Default: None. - bias_regularizer: Regularizer for biases. Default: None. - activity_regularizer: Regularizer for activity. Default: None. - kernel_constraint: Constraint for kernels. Default: None. - bias_constraint: Constraint for biases. Default: None. - seed: Random seed for dropout. Default: None.	`{}`

Call Args

query: Query tensor of shape (B, S, E) where B is batch size, S is sequence length, and E is the feature dimension. key: Key tensor of shape (B, S, E). value: Value tensor of shape (B, S, E). training: Python boolean indicating whether the layer should behave in training mode (applying dropout) or in inference mode (no dropout).

Returns:

Name	Type	Description
`output`		Attention output of shape `(B, S, E)`.

Example

d_model = 64
n_head = 4
seq_len = 10
batch_size = 32

layer = InterpretableMultiHeadAttention(
    d_model=d_model,
    n_head=n_head,
    kernel_initializer='he_normal',
    use_bias=False
)
query = tf.random.normal((batch_size, seq_len, d_model))
output = layer(query, query, query)
attention_scores = layer.attention_scores  # Access attention weights

Initialize the layer.

Functions

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`InterpretableMultiHeadAttention`	Layer instance

🎯 MultiResolutionTabularAttention

Multi-resolution attention for different feature scales and granularities.

kmr.layers.MultiResolutionTabularAttention

This module implements a MultiResolutionTabularAttention layer that applies separate attention mechanisms for numerical and categorical features, along with cross-attention between them. It's particularly useful for mixed-type tabular data.

Classes

MultiResolutionTabularAttention

MultiResolutionTabularAttention(
    num_heads,
    d_model,
    dropout_rate=0.1,
    name=None,
    **kwargs
)

Custom layer to apply multi-resolution attention for mixed-type tabular data.

This layer implements separate attention mechanisms for numerical and categorical features, along with cross-attention between them. It's designed to handle the different characteristics of numerical and categorical features in tabular data.

Parameters:

Name	Type	Description	Default
`num_heads`	`int`	Number of attention heads	required
`d_model`	`int`	Dimensionality of the attention model	required
`dropout_rate`	`float`	Dropout rate for regularization	`0.1`
`name`	`str`	Name for the layer	`None`

Input shape

List of two tensors: - Numerical features: (batch_size, num_samples, num_numerical_features) - Categorical features: (batch_size, num_samples, num_categorical_features)

Output shape

List of two tensors with shapes: - (batch_size, num_samples, d_model) (numerical features) - (batch_size, num_samples, d_model) (categorical features)

Example

import keras
from kmr.layers import MultiResolutionTabularAttention

# Create sample input data
numerical = keras.random.normal((32, 100, 10))  # 32 batches, 100 samples, 10 numerical features
categorical = keras.random.normal((32, 100, 5))  # 32 batches, 100 samples, 5 categorical features

# Apply multi-resolution attention
attention = MultiResolutionTabularAttention(num_heads=4, d_model=32, dropout_rate=0.1)
num_out, cat_out = attention([numerical, categorical])
print("Numerical output shape:", num_out.shape)  # (32, 100, 32)
print("Categorical output shape:", cat_out.shape)  # (32, 100, 32)

Initialize the MultiResolutionTabularAttention.

Parameters:

Name	Type	Description	Default
`num_heads`	`int`	Number of attention heads.	required
`d_model`	`int`	Model dimension.	required
`dropout_rate`	`float`	Dropout rate.	`0.1`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`list[tuple[int, ...]]`	List of shapes of the input tensors.	required

Returns:

Type	Description
`list[tuple[int, ...]]`	List of shapes of the output tensors.

🔀 Gated Networks

⚡ GatedLinearUnit

Gated linear unit for intelligent feature gating and selective information flow.

kmr.layers.GatedLinearUnit

This module implements a GatedLinearUnit layer that applies a gated linear transformation to input tensors. It's particularly useful for controlling information flow in neural networks.

Classes

GatedLinearUnit

GatedLinearUnit(units, name=None, **kwargs)

GatedLinearUnit is a custom Keras layer that implements a gated linear unit.

This layer applies a dense linear transformation to the input tensor and multiplies the result with the output of a dense sigmoid transformation. The result is a tensor where the input data is filtered based on the learned weights and biases of the layer.

Parameters:

Name	Type	Description	Default
`units`	`int`	Positive integer, dimensionality of the output space.	required
`name`	`str`	Name for the layer.	`None`

Input shape

Tensor with shape: (batch_size, ..., input_dim)

Output shape

Tensor with shape: (batch_size, ..., units)

Example

import keras
from kmr.layers import GatedLinearUnit

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Create the layer
glu = GatedLinearUnit(units=8)
y = glu(x)
print("Output shape:", y.shape)  # (32, 8)

Initialize the GatedLinearUnit layer.

Parameters:

Name	Type	Description	Default
`units`	`int`	Number of units in the layer.	required
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🔄 GatedResidualNetwork

Gated residual network for complex feature interactions and gradient flow.

kmr.layers.GatedResidualNetwork

This module implements a GatedResidualNetwork layer that combines residual connections with gated linear units for improved gradient flow and feature transformation.

Classes

GatedResidualNetwork

GatedResidualNetwork(
    units, dropout_rate=0.2, name=None, **kwargs
)

GatedResidualNetwork is a custom Keras layer that implements a gated residual network.

This layer applies a series of transformations to the input tensor and combines the result with the input using a residual connection. The transformations include a dense layer with ELU activation, a dense linear layer, a dropout layer, a gated linear unit layer, layer normalization, and a final dense layer.

Parameters:

Name	Type	Description	Default
`units`	`int`	Positive integer, dimensionality of the output space.	required
`dropout_rate`	`float`	Dropout rate for regularization. Defaults to 0.2.	`0.2`
`name`	`str`	Name for the layer.	`None`

Input shape

Tensor with shape: (batch_size, ..., input_dim)

Output shape

Tensor with shape: (batch_size, ..., units)

Example

import keras
from kmr.layers import GatedResidualNetwork

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Create the layer
grn = GatedResidualNetwork(units=16, dropout_rate=0.2)
y = grn(x)
print("Output shape:", y.shape)  # (32, 16)

Initialize the GatedResidualNetwork.

Parameters:

Name	Type	Description	Default
`units`	`int`	Number of units in the network.	required
`dropout_rate`	`float`	Dropout rate.	`0.2`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🎯 GatedFeaturesSelection

Gated feature selection mechanism for adaptive feature importance weighting.

kmr.layers.GatedFeaturesSelection

Classes

GatedFeatureSelection

GatedFeatureSelection(
    input_dim, reduction_ratio=4, **kwargs
)

Gated feature selection layer with residual connection.

This layer implements a learnable feature selection mechanism using a gating network. Each feature is assigned a dynamic importance weight between 0 and 1 through a multi-layer gating network. The gating network includes batch normalization and ReLU activations for stable training. A small residual connection (0.1) is added to maintain gradient flow.

The layer is particularly useful for: 1. Dynamic feature importance learning 2. Feature selection in time-series data 3. Attention-like mechanisms for tabular data 4. Reducing noise in input features

Example:

import numpy as np
from keras import layers, Model
from kmr.layers import GatedFeatureSelection

# Create sample input data
input_dim = 20
x = np.random.normal(size=(100, input_dim))

# Build model with gated feature selection
inputs = layers.Input(shape=(input_dim,))
x = GatedFeatureSelection(input_dim=input_dim, reduction_ratio=4)(inputs)
outputs = layers.Dense(1)(x)
model = Model(inputs=inputs, outputs=outputs)

# The layer will learn which features are most important
# and dynamically adjust their contribution to the output

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of the input features	required
`reduction_ratio`	`int`	Ratio to reduce the hidden dimension of the gating network. A higher ratio means fewer parameters but potentially less expressive gates. Default is 4, meaning the hidden dimension will be input_dim // 4.	`4`

Initialize the gated feature selection layer.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of the input features. Must match the last dimension of the input tensor.	required
`reduction_ratio`	`int`	Ratio to reduce the hidden dimension of the gating network. The hidden dimension will be max(input_dim // reduction_ratio, 1). Default is 4.	`4`
`**kwargs`	`dict[str, Any]`	Additional layer arguments passed to the parent Layer class.	`{}`

Functions

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`GatedFeatureSelection`	GatedFeatureSelection instance

🚀 Boosting Layers

📈 BoostingBlock

Gradient boosting inspired neural network block for sequential learning.

kmr.layers.BoostingBlock

This module implements a BoostingBlock layer that simulates gradient boosting behavior in a neural network. The layer computes a correction term via a configurable MLP and adds a scaled version to the input.

Classes

BoostingBlock

BoostingBlock(
    hidden_units=64,
    hidden_activation="relu",
    output_activation=None,
    gamma_trainable=True,
    gamma_initializer="ones",
    use_bias=True,
    kernel_initializer="glorot_uniform",
    bias_initializer="zeros",
    dropout_rate=None,
    name=None,
    **kwargs
)

A neural network layer that simulates gradient boosting behavior.

This layer implements a weak learner that computes a correction term via a configurable MLP and adds a scaled version of this correction to the input. Stacking several such blocks can mimic the iterative residual-correction process of gradient boosting.

The output is computed as

output = inputs + gamma * f(inputs)

where: - f is a configurable MLP (default: two-layer network) - gamma is a learnable or fixed scaling factor

Parameters:

Name	Type	Description	Default
`hidden_units`	`int \| list[int]`	Number of units in the hidden layer(s). Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64.	`64`
`hidden_activation`	`str`	Activation function for hidden layers. Default is 'relu'.	`'relu'`
`output_activation`	`str \| None`	Activation function for the output layer. Default is None.	`None`
`gamma_trainable`	`bool`	Whether the scaling factor gamma is trainable. Default is True.	`True`
`gamma_initializer`	`str \| Initializer`	Initializer for the gamma scaling factor. Default is 'ones'.	`'ones'`
`use_bias`	`bool`	Whether to include bias terms in the dense layers. Default is True.	`True`
`kernel_initializer`	`str \| Initializer`	Initializer for the dense layer kernels. Default is 'glorot_uniform'.	`'glorot_uniform'`
`bias_initializer`	`str \| Initializer`	Initializer for the dense layer biases. Default is 'zeros'.	`'zeros'`
`dropout_rate`	`float \| None`	Optional dropout rate to apply after hidden layers. Default is None.	`None`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

N-D tensor with shape: (batch_size, ..., input_dim)

Output shape

Same shape as input: (batch_size, ..., input_dim)

Example

import tensorflow as tf
from kmr.layers import BoostingBlock

# Create sample input data
x = tf.random.normal((32, 16))  # 32 samples, 16 features

# Basic usage
block = BoostingBlock(hidden_units=64)
y = block(x)
print("Output shape:", y.shape)  # (32, 16)

# Advanced configuration
block = BoostingBlock(
    hidden_units=[32, 16],  # Two hidden layers
    hidden_activation='selu',
    dropout_rate=0.1,
    gamma_trainable=False
)
y = block(x)

Initialize the BoostingBlock layer.

Parameters:

Name	Type	Description	Default
`hidden_units`	`int \| list[int]`	Number of hidden units or list of units per layer.	`64`
`hidden_activation`	`str`	Activation function for hidden layers.	`'relu'`
`output_activation`	`str \| None`	Activation function for output layer.	`None`
`gamma_trainable`	`bool`	Whether gamma parameter is trainable.	`True`
`gamma_initializer`	`str \| Initializer`	Initializer for gamma parameter.	`'ones'`
`use_bias`	`bool`	Whether to use bias.	`True`
`kernel_initializer`	`str \| Initializer`	Initializer for kernel weights.	`'glorot_uniform'`
`bias_initializer`	`str \| Initializer`	Initializer for bias weights.	`'zeros'`
`dropout_rate`	`float \| None`	Dropout rate.	`None`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🎯 BoostingEnsembleLayer

Ensemble of boosting blocks for improved performance and robustness.

kmr.layers.BoostingEnsembleLayer

This module implements a BoostingEnsembleLayer that aggregates multiple BoostingBlocks in parallel. Their outputs are combined via learnable weights to form an ensemble prediction. This is similar in spirit to boosting ensembles but implemented in a differentiable, end-to-end manner.

Classes

BoostingEnsembleLayer

BoostingEnsembleLayer(
    num_learners=3,
    learner_units=64,
    hidden_activation="relu",
    output_activation=None,
    gamma_trainable=True,
    dropout_rate=None,
    name=None,
    **kwargs
)

Ensemble layer of boosting blocks for tabular data.

This layer aggregates multiple boosting blocks (weak learners) in parallel. Each learner produces a correction to the input. A gating mechanism (via learnable weights) then computes a weighted sum of the learners' outputs.

Parameters:

Name	Type	Description	Default
`num_learners`	`int`	Number of boosting blocks in the ensemble. Default is 3.	`3`
`learner_units`	`int \| list[int]`	Number of hidden units in each boosting block. Can be an int for single hidden layer or a list of ints for multiple hidden layers. Default is 64.	`64`
`hidden_activation`	`str`	Activation function for hidden layers in boosting blocks. Default is 'relu'.	`'relu'`
`output_activation`	`str \| None`	Activation function for the output layer in boosting blocks. Default is None.	`None`
`gamma_trainable`	`bool`	Whether the scaling factor gamma in boosting blocks is trainable. Default is True.	`True`
`dropout_rate`	`float \| None`	Optional dropout rate to apply in boosting blocks. Default is None.	`None`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

N-D tensor with shape: (batch_size, ..., input_dim)

Output shape

Same shape as input: (batch_size, ..., input_dim)

Example

import keras
from kmr.layers import BoostingEnsembleLayer

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Basic usage
ensemble = BoostingEnsembleLayer(num_learners=3, learner_units=64)
y = ensemble(x)
print("Ensemble output shape:", y.shape)  # (32, 16)

# Advanced configuration
ensemble = BoostingEnsembleLayer(
    num_learners=5,
    learner_units=[32, 16],  # Two hidden layers in each learner
    hidden_activation='selu',
    dropout_rate=0.1
)
y = ensemble(x)

Initialize the BoostingEnsembleLayer.

Parameters:

Name	Type	Description	Default
`num_learners`	`int`	Number of boosting learners.	`3`
`learner_units`	`int \| list[int]`	Number of units per learner or list of units.	`64`
`hidden_activation`	`str`	Activation function for hidden layers.	`'relu'`
`output_activation`	`str \| None`	Activation function for output layer.	`None`
`gamma_trainable`	`bool`	Whether gamma parameter is trainable.	`True`
`dropout_rate`	`float \| None`	Dropout rate.	`None`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🏗️ Specialized Layers

📋 BusinessRulesLayer

Integration of business rules and domain knowledge into neural networks.

kmr.layers.BusinessRulesLayer

This module implements a BusinessRulesLayer that allows applying configurable business rules to neural network outputs. This enables combining learned patterns with explicit domain knowledge.

Classes

BusinessRulesLayer

BusinessRulesLayer(
    rules,
    feature_type,
    trainable_weights=True,
    weight_initializer="ones",
    name=None,
    **kwargs
)

Evaluates business-defined rules for anomaly detection.

This layer applies user-defined business rules to detect anomalies. Rules can be defined for both numerical and categorical features.

For numerical features

Comparison operators: '>' and '<'
Example: [(">", 0), ("<", 100)] for range validation

For categorical features

Set operators: '==', 'in', '!=', 'not in'
Example: [("in", ["red", "green", "blue"])] for valid categories

Attributes:

Name	Type	Description
`rules`		List of rule tuples (operator, value).
`feature_type`		Type of feature ('numerical' or 'categorical').

Example

# Numerical rules
layer = BusinessRulesLayer(rules=[(">", 0), ("<", 100)], feature_type="numerical")
outputs = layer(tf.constant([[50.0], [-10.0]]))
print(outputs['business_anomaly'])  # [[False], [True]]

# Categorical rules
layer = BusinessRulesLayer(
    rules=[("in", ["red", "green"])],
    feature_type="categorical"
)
outputs = layer(tf.constant([["red"], ["blue"]]))
print(outputs['business_anomaly'])  # [[False], [True]]

Initializes the layer.

Parameters:

Name	Type	Description	Default
`rules`	`list[Rule]`	List of rule tuples (operator, value).	required
`feature_type`	`str`	Type of feature ('numerical' or 'categorical').	required
`trainable_weights`	`bool`	Whether to use trainable weights for soft rule enforcement. Default is True.	`True`
`weight_initializer`	`str \| Initializer`	Initializer for rule weights. Default is 'ones'.	`'ones'`
`name`	`str \| None`	Optional name for the layer.	`None`
`**kwargs`	`Any`	Additional layer arguments.	`{}`

Raises:

Type	Description
`ValueError`	If feature_type is invalid or rules have invalid operators.

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int \| None, int]`	Input shape tuple.	required

Returns:

Type	Description
`dict[str, tuple[int \| None, int]]`	Dictionary mapping output names to their shapes.

🔍 NumericalAnomalyDetection

Anomaly detection for numerical features using statistical and ML methods.

kmr.layers.NumericalAnomalyDetection

Classes

NumericalAnomalyDetection

NumericalAnomalyDetection(
    hidden_dims,
    reconstruction_weight=0.5,
    distribution_weight=0.5,
    **kwargs
)

Numerical anomaly detection layer for identifying outliers in numerical features.

This layer learns a distribution for each numerical feature and outputs an anomaly score for each feature based on how far it deviates from the learned distribution. The layer uses a combination of mean, variance, and autoencoder reconstruction error to detect anomalies.

Example

import tensorflow as tf
from kmr.layers import NumericalAnomalyDetection

# Suppose we have 5 numerical features
x = tf.random.normal((32, 5))  # Batch of 32 samples
# Create a NumericalAnomalyDetection layer
anomaly_layer = NumericalAnomalyDetection(
    hidden_dims=[8, 4],
    reconstruction_weight=0.5,
    distribution_weight=0.5
)
anomaly_scores = anomaly_layer(x)
print("Anomaly scores shape:", anomaly_scores.shape)  # Expected: (32, 5)

Initialize the layer.

Parameters:

Name	Type	Description	Default
`hidden_dims`	`list[int]`	List of hidden dimensions for the autoencoder.	required
`reconstruction_weight`	`float`	Weight for reconstruction error in anomaly score.	`0.5`
`distribution_weight`	`float`	Weight for distribution-based error in anomaly score.	`0.5`
`**kwargs`	`dict[str, Any]`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute output shape.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	Input shape tuple.	required

Returns:

Type	Description
`tuple[int, ...]`	Output shape tuple.

🏷️ CategoricalAnomalyDetectionLayer

Anomaly detection for categorical features with pattern recognition.

kmr.layers.CategoricalAnomalyDetectionLayer

Classes

CategoricalAnomalyDetectionLayer

CategoricalAnomalyDetectionLayer(dtype='string', **kwargs)

Backend-agnostic anomaly detection for categorical features.

This layer detects anomalies in categorical features by checking if values belong to a predefined set of valid categories. Values not in this set are considered anomalous.

The layer uses a Keras StringLookup or IntegerLookup layer internally to efficiently map input values to indices, which are then used to determine if a value is valid.

Attributes:

Name	Type	Description
`dtype`	`Any`	The data type of input values ('string' or 'int32').
`lookup`		A Keras lookup layer for mapping values to indices.
`vocabulary`		list of valid categorical values.

Example

layer = CategoricalAnomalyDetectionLayer(dtype='string')
layer.initialize_from_stats(vocabulary=['red', 'green', 'blue'])
outputs = layer(tf.constant([['red'], ['purple']]))
print(outputs['anomaly'])  # [[False], [True]]

Initializes the layer.

Parameters:

Name	Type	Description	Default
`dtype`	`str`	Data type of input values ('string' or 'int32'). Defaults to 'string'.	`'string'`
`**kwargs`		Additional layer arguments.	`{}`

Raises:

Type	Description
`ValueError`	If dtype is not 'string' or 'int32'.

Attributes

dtype `property`

dtype

Get the dtype of the layer.

Functions

set_dtype

set_dtype(value)

Set the dtype and initialize the appropriate lookup layer.

initialize_from_stats

initialize_from_stats(vocabulary)

Initializes the layer with a vocabulary of valid values.

Parameters:

Name	Type	Description	Default
`vocabulary`	`list[str \| int]`	list of valid categorical values.	required

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int \| None, int]`	Input shape tuple.	required

Returns:

Type	Description
`dict[str, tuple[int \| None, int]]`	Dictionary mapping output names to their shapes.

from_config `classmethod`

from_config(config)

Create layer from configuration.

✂️ FeatureCutout

Feature cutout for data augmentation and regularization in tabular data.

kmr.layers.FeatureCutout

Feature cutout regularization layer for neural networks.

Classes

FeatureCutout

FeatureCutout(
    cutout_prob=0.1, noise_value=0.0, seed=None, **kwargs
)

Feature cutout regularization layer.

This layer randomly masks out (sets to zero) a specified fraction of features during training to improve model robustness and prevent overfitting. During inference, all features are kept intact.

Example

from keras import random
from kmr.layers import FeatureCutout

# Create sample data
batch_size = 32
feature_dim = 10
inputs = random.normal((batch_size, feature_dim))

# Apply feature cutout
cutout = FeatureCutout(cutout_prob=0.2)
masked_outputs = cutout(inputs, training=True)

Initialize feature cutout.

Parameters:

Name	Type	Description	Default
`cutout_prob`	`float`	Probability of masking each feature	`0.1`
`noise_value`	`float`	Value to use for masked features (default: 0.0)	`0.0`
`seed`	`int \| None`	Random seed for reproducibility	`None`
`**kwargs`	`dict[str, Any]`	Additional layer arguments	`{}`

Raises:

Type	Description
`ValueError`	If cutout_prob is not in [0, 1]

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute output shape.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	Input shape tuple	required

Returns:

Type	Description
`tuple[int, ...]`	Output shape tuple

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`FeatureCutout`	FeatureCutout instance

🎯 SparseAttentionWeighting

Sparse attention weighting mechanisms for efficient computation.

kmr.layers.SparseAttentionWeighting

Classes

SparseAttentionWeighting

SparseAttentionWeighting(
    num_modules, temperature=1.0, **kwargs
)

Sparse attention mechanism with temperature scaling for module outputs combination.

This layer implements a learnable attention mechanism that combines outputs from multiple modules using temperature-scaled attention weights. The attention weights are learned during training and can be made more or less sparse by adjusting the temperature parameter. A higher temperature leads to more uniform weights, while a lower temperature makes the weights more concentrated on specific modules.

Key features: 1. Learnable module importance weights 2. Temperature-controlled sparsity 3. Softmax-based attention mechanism 4. Support for variable number of input features per module

Example:

import numpy as np
from keras import layers, Model
from kmr.layers import SparseAttentionWeighting

# Create sample module outputs
batch_size = 32
num_modules = 3
feature_dim = 64

# Create three different module outputs
module1 = layers.Dense(feature_dim)(inputs)
module2 = layers.Dense(feature_dim)(inputs)
module3 = layers.Dense(feature_dim)(inputs)

# Combine module outputs using sparse attention
attention = SparseAttentionWeighting(
    num_modules=num_modules,
    temperature=0.5  # Lower temperature for sharper attention
)
combined_output = attention([module1, module2, module3])

# The layer will learn which modules are most important
# and weight their outputs accordingly

Parameters:

Name	Type	Description	Default
`num_modules`	`int`	Number of input modules whose outputs will be combined.	required
`temperature`	`float`	Temperature parameter for softmax scaling. Default is 1.0. - temperature > 1.0: More uniform attention weights - temperature < 1.0: More sparse attention weights - temperature = 1.0: Standard softmax behavior	`1.0`

Initialize sparse attention weighting layer.

Parameters:

Name	Type	Description	Default
`num_modules`	`int`	Number of input modules to weight. Must be positive.	required
`temperature`	`float`	Temperature parameter for softmax scaling. Must be positive. Controls the sparsity of attention weights: - Higher values (>1.0) lead to more uniform weights - Lower values (<1.0) lead to more concentrated weights	`1.0`
`**kwargs`	`dict[str, Any]`	Additional layer arguments passed to the parent Layer class.	`{}`

Raises:

Type	Description
`ValueError`	If num_modules <= 0 or temperature <= 0

Functions

from_config `classmethod`

from_config(config)

Create layer from configuration.

Parameters:

Name	Type	Description	Default
`config`	`dict[str, Any]`	Layer configuration dictionary	required

Returns:

Type	Description
`SparseAttentionWeighting`	SparseAttentionWeighting instance

🎭 TabularMoELayer

Mixture of Experts for tabular data with adaptive expert selection.

kmr.layers.TabularMoELayer

This module implements a TabularMoELayer (Mixture-of-Experts) that routes input features through multiple expert sub-networks and aggregates their outputs via a learnable gating mechanism. This approach is useful for tabular data where different experts can specialize in different feature patterns.

Classes

TabularMoELayer

TabularMoELayer(
    num_experts=4, expert_units=16, name=None, **kwargs
)

Mixture-of-Experts layer for tabular data.

This layer routes input features through multiple expert sub-networks and aggregates their outputs via a learnable gating mechanism. Each expert is a small MLP, and the gate learns to weight their contributions.

Parameters:

Name	Type	Description	Default
`num_experts`	`int`	Number of expert networks. Default is 4.	`4`
`expert_units`	`int`	Number of hidden units in each expert network. Default is 16.	`16`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
from kmr.layers import TabularMoELayer

# Tabular data with 8 features
x = keras.random.normal((32, 8))

# Create the layer with 4 experts and 16 units per expert
moe_layer = TabularMoELayer(num_experts=4, expert_units=16)
y = moe_layer(x)
print("MoE output shape:", y.shape)  # Expected: (32, 8)

Initialize the TabularMoELayer.

Parameters:

Name	Type	Description	Default
`num_experts`	`int`	Number of expert networks.	`4`
`expert_units`	`int`	Number of units in each expert.	`16`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🔧 Utility Layers

🔢 CastToFloat32Layer

Type casting utility layer for ensuring consistent data types.

kmr.layers.CastToFloat32Layer

This module implements a CastToFloat32Layer that casts input tensors to float32 data type.

Classes

CastToFloat32Layer

CastToFloat32Layer(name=None, **kwargs)

Layer that casts input tensors to float32 data type.

This layer is useful for ensuring consistent data types in a model, especially when working with mixed precision or when receiving inputs of various data types.

Parameters:

Name	Type	Description	Default
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

Tensor of any shape and numeric data type.

Output shape

Same as input shape, but with float32 data type.

Example

import keras
import numpy as np
from kmr.layers import CastToFloat32Layer

# Create sample input data with int64 type
x = keras.ops.convert_to_tensor(np.array([1, 2, 3], dtype=np.int64))

# Apply casting layer
cast_layer = CastToFloat32Layer()
y = cast_layer(x)

print(y.dtype)  # float32

Initialize the CastToFloat32Layer.

Parameters:

Name	Type	Description	Default
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`	`tuple[int, ...]`	Shape of the input tensor.	required

Returns:

Type	Description
`tuple[int, ...]`	Same shape as input.

⚙️ DifferentiableTabularPreprocessor

Differentiable preprocessing for tabular data with gradient flow.

kmr.layers.DifferentiableTabularPreprocessor

This module implements a DifferentiableTabularPreprocessor layer that integrates preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end. This approach is useful for tabular data with missing values and features that need normalization.

Classes

DifferentiableTabularPreprocessor

DifferentiableTabularPreprocessor(
    num_features, name=None, **kwargs
)

A differentiable preprocessing layer for numeric tabular data.

This layer

Replaces missing values (NaNs) with a learnable imputation vector.
Applies a learned affine transformation (scaling and shifting) to each feature.

The idea is to integrate preprocessing into the model so that the optimal imputation and normalization parameters are learned end-to-end.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of numeric features in the input.	required
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
import numpy as np
from kmr.layers import DifferentiableTabularPreprocessor

# Suppose we have tabular data with 5 numeric features
x = keras.ops.convert_to_tensor([
    [1.0, np.nan, 3.0, 4.0, 5.0],
    [2.0, 2.0, np.nan, 4.0, 5.0]
], dtype="float32")

preproc = DifferentiableTabularPreprocessor(num_features=5)
y = preproc(x)
print(y)

Initialize the DifferentiableTabularPreprocessor.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of input features.	required
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🔄 DifferentialPreprocessingLayer

Differential preprocessing operations for advanced data transformations.

kmr.layers.DifferentialPreprocessingLayer

This module implements a DifferentialPreprocessingLayer that applies multiple candidate transformations to tabular data and learns to combine them optimally. It also handles missing values with learnable imputation. This approach is useful for tabular data where the optimal preprocessing strategy is not known in advance.

Classes

DifferentialPreprocessingLayer

DifferentialPreprocessingLayer(
    num_features, mlp_hidden_units=4, name=None, **kwargs
)

Differentiable preprocessing layer for numeric tabular data with multiple candidate transformations.

This layer

Imputes missing values using a learnable imputation vector.
Applies several candidate transformations:
Identity (pass-through)
Affine transformation (learnable scaling and bias)
Nonlinear transformation via a small MLP
Log transformation (using a softplus to ensure positivity)
Learns softmax combination weights to aggregate the candidates.

The entire preprocessing pipeline is differentiable, so the network learns the optimal imputation and transformation jointly with downstream tasks.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of numeric features in the input.	required
`mlp_hidden_units`	`int`	Number of hidden units in the nonlinear branch. Default is 4.	`4`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
import numpy as np
from kmr.layers import DifferentialPreprocessingLayer

# Create dummy data: 6 samples, 4 features (with some missing values)
x = keras.ops.convert_to_tensor([
    [1.0, 2.0, float('nan'), 4.0],
    [2.0, float('nan'), 3.0, 4.0],
    [float('nan'), 2.0, 3.0, 4.0],
    [1.0, 2.0, 3.0, float('nan')],
    [1.0, 2.0, 3.0, 4.0],
    [2.0, 3.0, 4.0, 5.0],
], dtype="float32")

# Instantiate the layer for 4 features.
preproc_layer = DifferentialPreprocessingLayer(num_features=4, mlp_hidden_units=8)
y = preproc_layer(x)
print(y)

Initialize the DifferentialPreprocessingLayer.

Parameters:

Name	Type	Description	Default
`num_features`	`int`	Number of input features.	required
`mlp_hidden_units`	`int`	Number of hidden units in MLP.	`4`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

📊 DistributionAwareEncoder

Distribution-aware feature encoding for optimal representation learning.

kmr.layers.DistributionAwareEncoder

This module implements a DistributionAwareEncoder layer that automatically detects the distribution type of input data and applies appropriate transformations and encodings. It builds upon the DistributionTransformLayer but adds more sophisticated distribution detection and specialized encoding for different distribution types.

Classes

DistributionAwareEncoder

DistributionAwareEncoder(
    embedding_dim=None,
    auto_detect=True,
    distribution_type="unknown",
    transform_type="auto",
    add_distribution_embedding=False,
    name=None,
    **kwargs
)

Layer that automatically detects and encodes data based on its distribution.

This layer first detects the distribution type of the input data and then applies appropriate transformations and encodings. It builds upon the DistributionTransformLayer but adds more sophisticated distribution detection and specialized encoding for different distribution types.

Parameters:

Name	Type	Description	Default
`embedding_dim`	`int \| None`	Dimension of the output embedding. If None, the output will have the same dimension as the input. Default is None.	`None`
`auto_detect`	`bool`	Whether to automatically detect the distribution type. If False, the layer will use the specified distribution_type. Default is True.	`True`
`distribution_type`	`str`	The distribution type to use if auto_detect is False. Options are "normal", "exponential", "lognormal", "uniform", "beta", "bimodal", "heavy_tailed", "mixed", "bounded", "unknown". Default is "unknown".	`'unknown'`
`transform_type`	`str`	The transformation type to use. If "auto", the layer will automatically select the best transformation based on the detected distribution. See DistributionTransformLayer for available options. Default is "auto".	`'auto'`
`add_distribution_embedding`	`bool`	Whether to add a learned embedding of the distribution type to the output. Default is False.	`False`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

N-D tensor with shape: (batch_size, ..., features).

Output shape

If embedding_dim is None, same shape as input: (batch_size, ..., features). If embedding_dim is specified: (batch_size, ..., embedding_dim). If add_distribution_embedding is True, the output will have an additional dimension for the distribution embedding.

Example

import keras
import numpy as np
from kmr.layers import DistributionAwareEncoder

# Create sample input data with different distributions
# Normal distribution
normal_data = keras.ops.convert_to_tensor(
    np.random.normal(0, 1, (100, 10)), dtype="float32"
)

# Exponential distribution
exp_data = keras.ops.convert_to_tensor(
    np.random.exponential(1, (100, 10)), dtype="float32"
)

# Create the encoder
encoder = DistributionAwareEncoder(embedding_dim=16, add_distribution_embedding=True)

# Apply to normal data
normal_encoded = encoder(normal_data)
print("Normal encoded shape:", normal_encoded.shape)  # (100, 16)

# Apply to exponential data
exp_encoded = encoder(exp_data)
print("Exponential encoded shape:", exp_encoded.shape)  # (100, 16)

Initialize the DistributionAwareEncoder.

Parameters:

Name	Type	Description	Default
`embedding_dim`	`int \| None`	Embedding dimension.	`None`
`auto_detect`	`bool`	Whether to auto-detect distribution type.	`True`
`distribution_type`	`str`	Type of distribution.	`'unknown'`
`transform_type`	`str`	Type of transformation to apply.	`'auto'`
`add_distribution_embedding`	`bool`	Whether to add distribution embedding.	`False`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🎛️ HyperZZWOperator

Hyperparameter-aware operator for adaptive model behavior.

kmr.layers.HyperZZWOperator

This module implements a HyperZZWOperator layer that computes context-dependent weights by multiplying inputs with hyper-kernels. This is a specialized layer for the Terminator model.

Classes

HyperZZWOperator

HyperZZWOperator(
    input_dim, context_dim=None, name=None, **kwargs
)

A layer that computes context-dependent weights by multiplying inputs with hyper-kernels.

This layer takes two inputs: the original input tensor and a context tensor. It generates hyper-kernels from the context and performs a context-dependent transformation of the input.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of the input features.	required
`context_dim`	`int \| None`	Optional dimension of the context features. If not provided, it will be inferred.	`None`
`name`	`str \| None`	Optional name for the layer.	`None`

Input

A list of two tensors: - inputs[0]: Input tensor with shape (batch_size, input_dim). - inputs[1]: Context tensor with shape (batch_size, context_dim).

Output shape

2D tensor with shape: (batch_size, input_dim) (same as input)

Example

import keras
from kmr.layers import HyperZZWOperator

# Create sample input data
inputs = keras.random.normal((32, 16))  # 32 samples, 16 features
context = keras.random.normal((32, 8))  # 32 samples, 8 context features

# Create the layer
zzw_op = HyperZZWOperator(input_dim=16, context_dim=8)
context_weights = zzw_op([inputs, context])
print("Output shape:", context_weights.shape)  # (32, 16)

Initialize the HyperZZWOperator.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Input dimension.	required
`context_dim`	`int \| None`	Context dimension.	`None`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🐌 SlowNetwork

Slow network architecture for careful and deliberate feature processing.

kmr.layers.SlowNetwork

This module implements a SlowNetwork layer that processes features through multiple dense layers. It's designed to be used as a component in more complex architectures.

Classes

SlowNetwork

SlowNetwork(
    input_dim, num_layers=3, units=128, name=None, **kwargs
)

A multi-layer network with configurable depth and width.

This layer processes input features through multiple dense layers with ReLU activations, and projects the output back to the original feature dimension.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of the input features.	required
`num_layers`	`int`	Number of hidden layers. Default is 3.	`3`
`units`	`int`	Number of units per hidden layer. Default is 128.	`128`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, input_dim)

Output shape

2D tensor with shape: (batch_size, input_dim) (same as input)

Example

import keras
from kmr.layers import SlowNetwork

# Create sample input data
x = keras.random.normal((32, 16))  # 32 samples, 16 features

# Create the layer
slow_net = SlowNetwork(input_dim=16, num_layers=3, units=64)
y = slow_net(x)
print("Output shape:", y.shape)  # (32, 16)

Initialize the SlowNetwork layer.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Input dimension.	required
`num_layers`	`int`	Number of hidden layers.	`3`
`units`	`int`	Number of units in each layer.	`128`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

📝 TextPreprocessingLayer

Text preprocessing utilities for natural language features in tabular data.

kmr.layers.TextPreprocessingLayer

🕸️ Graph and Advanced Features

🧠 AdvancedGraphFeature

Advanced graph-based feature processing with dynamic adjacency learning.

kmr.layers.AdvancedGraphFeature

Classes

AdvancedGraphFeatureLayer

AdvancedGraphFeatureLayer(
    embed_dim,
    num_heads,
    dropout_rate=0.0,
    hierarchical=False,
    num_groups=None,
    **kwargs
)

Advanced graph-based feature layer for tabular data.

This layer projects scalar features into an embedding space and then applies multi-head self-attention to compute data-dependent dynamic adjacencies between features. It learns edge attributes by considering both the raw embeddings and their differences. Optionally, a hierarchical aggregation is applied, where features are grouped via a learned soft-assignment and then re-expanded back to the original feature space. A residual connection and layer normalization are applied before the final projection back to the original feature space.

The layer is highly configurable, allowing for control over the embedding dimension, number of attention heads, dropout rate, and hierarchical aggregation.

Notes

When to Use This Layer: - When working with tabular data where feature interactions are important - For complex feature engineering tasks where manual feature crosses are insufficient - When dealing with heterogeneous features that require dynamic, learned relationships - In scenarios where feature importance varies across different samples - When hierarchical feature relationships exist in your data

Best Practices: - Start with a small embed_dim (e.g., 16 or 32) and increase if needed - Use num_heads=4 or 8 for most applications - Enable hierarchical=True when you have many features (>20) or known grouping structure - Set dropout_rate=0.1 or 0.2 for regularization during training - Use layer normalization (enabled by default) to stabilize training

Performance Considerations: - Memory usage scales quadratically with the number of features - Consider using hierarchical mode for large feature sets to reduce complexity - The layer works best with normalized input features - For very large feature sets (>100), consider feature pre-selection

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Dimensionality of the projected feature embeddings. Determines the size of the learned feature representations.	required
`num_heads`	`int`	Number of attention heads. Must divide embed_dim evenly. Each head learns different aspects of feature relationships.	required
`dropout_rate`	`float`	Dropout rate applied to attention weights during training. Helps prevent overfitting. Defaults to 0.0.	`0.0`
`hierarchical`	`bool`	Whether to apply hierarchical aggregation. If True, features are grouped into clusters, and aggregation is performed at the cluster level. Defaults to False.	`False`
`num_groups`	`int`	Number of groups to cluster features into when hierarchical is True. Must be provided if hierarchical is True. Controls the granularity of hierarchical aggregation.	`None`

Raises:

Type	Description
`ValueError`	If embed_dim is not divisible by num_heads. Ensures that the embedding dimension can be evenly split across attention heads.
`ValueError`	If hierarchical is True but num_groups is not provided. The number of groups must be specified when hierarchical aggregation is enabled.

Examples:

Basic Usage:

import keras
from kmr.layers import AdvancedGraphFeatureLayer

# Dummy tabular data with 10 features for 32 samples.
x = keras.random.normal((32, 10))
# Create the advanced graph layer with an embedding dimension of 16 and 4 heads.
layer = AdvancedGraphFeatureLayer(embed_dim=16, num_heads=4)
y = layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

With Hierarchical Aggregation:

import keras
from kmr.layers import AdvancedGraphFeatureLayer

# Dummy tabular data with 10 features for 32 samples.
x = keras.random.normal((32, 10))
# Create the advanced graph layer with hierarchical aggregation into 4 groups.
layer = AdvancedGraphFeatureLayer(embed_dim=16, num_heads=4, hierarchical=True, num_groups=4)
y = layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Without Training:

import keras
from kmr.layers import AdvancedGraphFeatureLayer

# Dummy tabular data with 10 features for 32 samples.
x = keras.random.normal((32, 10))
# Create the advanced graph layer with an embedding dimension of 16 and 4 heads.
layer = AdvancedGraphFeatureLayer(embed_dim=16, num_heads=4)
y = layer(x, training=False)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the AdvancedGraphFeature layer.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Embedding dimension.	required
`num_heads`	`int`	Number of attention heads.	required
`dropout_rate`	`float`	Dropout rate.	`0.0`
`hierarchical`	`bool`	Whether to use hierarchical attention.	`False`
`num_groups`	`int \| None`	Number of groups for hierarchical attention.	`None`
`**kwargs`		Additional keyword arguments.	`{}`

Functions

compute_output_shape

compute_output_shape(input_shape)

Compute the output shape of the layer.

Parameters:

Name	Type	Description	Default
`input_shape`		Shape tuple (batch_size, num_features)	required

Returns:

Type	Description
`tuple[int, ...]`	Output shape tuple (batch_size, num_features)

🔗 GraphFeatureAggregation

Graph feature aggregation mechanisms for relationship modeling.

kmr.layers.GraphFeatureAggregation

This module implements a GraphFeatureAggregation layer that treats features as nodes in a graph and uses attention mechanisms to learn relationships between features. This approach is useful for tabular data where features have inherent relationships.

Classes

GraphFeatureAggregation

GraphFeatureAggregation(
    embed_dim=8,
    dropout_rate=0.0,
    leaky_relu_alpha=0.2,
    name=None,
    **kwargs
)

Graph-based feature aggregation layer with self-attention for tabular data.

This layer treats each input feature as a node and projects it into an embedding space. It then computes pairwise attention scores between features and aggregates feature information based on these scores. Finally, it projects the aggregated features back to the original feature space and adds a residual connection.

The process involves

Projecting each scalar feature to an embedding (shape: [batch, num_features, embed_dim]).
Computing pairwise concatenated embeddings and scoring them via a learnable attention vector.
Normalizing the scores with softmax to yield a dynamic adjacency (attention) matrix.
Aggregating neighboring features via weighted sum.
Projecting back to a vector of original dimension, then adding a residual connection.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Dimensionality of the projected feature embeddings. Default is 8.	`8`
`dropout_rate`	`float`	Dropout rate to apply on attention weights. Default is 0.0.	`0.0`
`leaky_relu_alpha`	`float`	Alpha parameter for the LeakyReLU activation. Default is 0.2.	`0.2`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
from kmr.layers import GraphFeatureAggregation

# Tabular data with 10 features
x = keras.random.normal((32, 10))

# Create the layer with an embedding dimension of 8 and dropout rate of 0.1
graph_layer = GraphFeatureAggregation(embed_dim=8, dropout_rate=0.1)
y = graph_layer(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the GraphFeatureAggregation layer.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Embedding dimension.	`8`
`dropout_rate`	`float`	Dropout rate.	`0.0`
`leaky_relu_alpha`	`float`	Alpha parameter for LeakyReLU.	`0.2`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

🎯 MultiHeadGraphFeaturePreprocessor

Multi-head graph feature preprocessing for complex feature interactions.

kmr.layers.MultiHeadGraphFeaturePreprocessor

This module implements a MultiHeadGraphFeaturePreprocessor layer that treats features as nodes in a graph and learns multiple "views" (heads) of the feature interactions via self-attention. This approach is useful for tabular data where complex feature relationships need to be captured.

Classes

MultiHeadGraphFeaturePreprocessor

MultiHeadGraphFeaturePreprocessor(
    embed_dim=16,
    num_heads=4,
    dropout_rate=0.0,
    name=None,
    **kwargs
)

Multi-head graph-based feature preprocessor for tabular data.

This layer treats each feature as a node and applies multi-head self-attention to capture and aggregate complex interactions among features. The process is:

Project each scalar input into an embedding of dimension embed_dim.
Split the embedding into num_heads heads.
For each head, compute queries, keys, and values and calculate scaled dot-product attention across the feature dimension.
Concatenate the head outputs, project back to the original feature dimension, and add a residual connection.

This mechanism allows the network to learn multiple relational views among features, which can significantly boost performance on tabular data.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Dimension of the feature embeddings. Default is 16.	`16`
`num_heads`	`int`	Number of attention heads. Default is 4.	`4`
`dropout_rate`	`float`	Dropout rate applied to attention weights. Default is 0.0.	`0.0`
`name`	`str \| None`	Optional name for the layer.	`None`

Input shape

2D tensor with shape: (batch_size, num_features)

Output shape

2D tensor with shape: (batch_size, num_features) (same as input)

Example

import keras
from kmr.layers import MultiHeadGraphFeaturePreprocessor

# Tabular data with 10 features
x = keras.random.normal((32, 10))

# Create the layer with 16-dim embeddings and 4 attention heads
graph_preproc = MultiHeadGraphFeaturePreprocessor(embed_dim=16, num_heads=4)
y = graph_preproc(x, training=True)
print("Output shape:", y.shape)  # Expected: (32, 10)

Initialize the MultiHeadGraphFeaturePreprocessor.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Embedding dimension.	`16`
`num_heads`	`int`	Number of attention heads.	`4`
`dropout_rate`	`float`	Dropout rate.	`0.0`
`name`	`str \| None`	Name of the layer.	`None`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Functions

split_heads

split_heads(x, batch_size)

Split the last dimension into (num_heads, depth) and transpose.

Parameters:

Name	Type	Description	Default
`x`	`KerasTensor`	Input tensor with shape (batch_size, num_features, embed_dim).	required
`batch_size`	`KerasTensor`	Batch size tensor.	required

Returns:

Type	Description
`KerasTensor`	Tensor with shape (batch_size, num_heads, num_features, depth).

🧩 Layers API Reference

📚 Layer Categories

🎯 Core Layers

🧠 TabularAttention

kmr.layers.TabularAttention

Classes

TabularAttention

Functions

compute_output_shape

🔢 AdvancedNumericalEmbedding

kmr.layers.AdvancedNumericalEmbedding

Classes

AdvancedNumericalEmbedding

Functions

compute_output_shape

🔀 GatedFeatureFusion

kmr.layers.GatedFeatureFusion

Classes

GatedFeatureFusion

🎯 VariableSelection

kmr.layers.VariableSelection

Classes

VariableSelection

Functions

compute_output_shape

🔄 TransformerBlock

kmr.layers.TransformerBlock

Classes

TransformerBlock

Functions

compute_output_shape

🎲 StochasticDepth

kmr.layers.StochasticDepth

Classes

StochasticDepth

Functions

compute_output_shape

from_config classmethod

🔧 Feature Engineering Layers

📊 DistributionTransformLayer

kmr.layers.DistributionTransformLayer

Classes

DistributionTransformLayer

📅 DateEncodingLayer

kmr.layers.DateEncodingLayer

Classes

DateEncodingLayer

Functions

compute_output_shape

🔍 DateParsingLayer

kmr.layers.DateParsingLayer

Classes

DateParsingLayer

Functions

compute_output_shape

🌸 SeasonLayer

kmr.layers.SeasonLayer

Classes

SeasonLayer

Functions

compute_output_shape

🧠 Attention Mechanisms

📊 ColumnAttention

kmr.layers.ColumnAttention

Classes

ColumnAttention

Functions

from_config classmethod

📋 RowAttention

kmr.layers.RowAttention

Classes

RowAttention

Functions

from_config classmethod

🔍 InterpretableMultiHeadAttention

kmr.layers.InterpretableMultiHeadAttention

Classes

InterpretableMultiHeadAttention

Functions

from_config classmethod

from_config `classmethod`

from_config `classmethod`

from_config `classmethod`

from_config `classmethod`

from_config `classmethod`

dtype `property`

from_config `classmethod`

from_config `classmethod`

from_config `classmethod`