🧩 Layers - Complete Reference & Explorer

36+ production-ready layers designed exclusively for Keras 3.
Build sophisticated tabular models with advanced attention, feature engineering, and preprocessing layers.

36+ Production Layers

8 Categories

100% Keras 3 Native

0% TensorFlow Lock-in

🎯 Why Use KerasFactory Layers?

Challenge	Traditional Approach	KerasFactory's Solution
🔗 Feature Interactions	Manual feature crosses	👁️ Tabular Attention - Automatic relationship discovery
🏷️ Mixed Feature Types	Uniform processing	🧩 Feature-wise Layers - Specialized processing per feature
📊 Complex Distributions	Fixed strategies	📊 Distribution-Aware Encoding - Adaptive transformations
⚡ Performance Optimization	Post-hoc analysis	🎯 Built-in Selection - Learned during training
🔒 Production Readiness	Extra tooling needed	✅ Battle-Tested - Used in production models

✨ Key Features

👁️

Attention Mechanisms

Automatically discover feature relationships and sample importance with advanced attention layers.

🧩

Feature-wise Processing

Each feature receives specialized processing through mixture of experts and dedicated layers.

📊

Distribution-Aware

Automatically adapt to different distributions with intelligent encoding and transformations.

⚡

Performance Ready

Optimized for production with built-in regularization and efficient memory usage.

🎯

Built-in Optimization

Learn which features matter during training, not after with integrated feature selection.

🔒

Production Proven

Battle-tested in real-world ML pipelines with comprehensive testing and documentation.

  💡 Pro Tip: Start with TabularAttention for feature relationships, VariableSelection for feature importance, and DifferentiableTabularPreprocessor for end-to-end preprocessing. Combine them for powerful custom architectures.

🔍 Interactive Layers Explorer

🔍 Smart Search & Advanced Filtering

Showing all 36+ layers

📚 All Layers by Category

⏱️ Time Series & Forecasting (16 layers)

Specialized layers for time series forecasting, decomposition, and feature extraction with multi-scale pattern recognition.

PositionalEmbedding - Sinusoidal positional encoding for sequence models
FixedEmbedding - Non-trainable embeddings for temporal indices (months, days, hours)
TokenEmbedding - 1D convolution-based embedding for time series values
TemporalEmbedding - Embedding layer for temporal features (month, day, weekday, hour, minute)
DataEmbeddingWithoutPosition - Combined token and temporal embedding for comprehensive features
MovingAverage - Trend extraction using moving average filtering
SeriesDecomposition - Trend-seasonal decomposition using moving average
DFTSeriesDecomposition - Frequency-based decomposition using Discrete Fourier Transform
ReversibleInstanceNorm - Reversible instance normalization with optional denormalization
ReversibleInstanceNormMultivariate - Multivariate reversible instance normalization
MultiScaleSeasonMixing - Bottom-up multi-scale seasonal pattern mixing
MultiScaleTrendMixing - Top-down multi-scale trend pattern mixing
PastDecomposableMixing - Decomposable mixing encoder combining decomposition and multi-scale mixing
TemporalMixing - MLP-based temporal mixing for TSMixer architecture
FeatureMixing - Feed-forward feature mixing for cross-series correlations
MixingLayer - Core mixing block combining temporal and feature mixing

🧠 Attention Mechanisms (6 layers)

Advanced attention layers for capturing complex feature relationships and dependencies in tabular data.

TabularAttention - Dual attention mechanism for inter-feature and inter-sample relationships
MultiResolutionTabularAttention - Multi-resolution attention for different feature scales
InterpretableMultiHeadAttention - Multi-head attention with explainability features
TransformerBlock - Standard transformer block with self-attention and feed-forward
ColumnAttention - Column-wise attention for feature relationships
RowAttention - Row-wise attention for sample relationships

🔧 Data Preprocessing & Transformation (9 layers)

Essential preprocessing layers for data cleaning, transformation, and preparation for optimal model performance.

DifferentiableTabularPreprocessor - End-to-end differentiable preprocessing with learnable imputation
DifferentialPreprocessingLayer - Multiple candidate transformations with learnable combination
DateParsingLayer - Flexible date parsing from various formats
DateEncodingLayer - Cyclical date feature encoding
SeasonLayer - Seasonal feature extraction
DistributionTransformLayer - Automatic distribution transformation
DistributionAwareEncoder - Distribution-aware feature encoding
CastToFloat32Layer - Type casting utility
AdvancedNumericalEmbedding - Advanced numerical embedding with dual-branch architecture

⚙️ Feature Engineering & Selection (5 layers)

Intelligent feature engineering and selection layers for identifying important features and creating powerful representations.

VariableSelection - Intelligent variable selection using gated residual networks
GatedFeatureSelection - Learnable feature selection with gating
GatedFeatureFusion - Gated mechanism for feature fusion
SparseAttentionWeighting - Sparse attention for efficient computation
FeatureCutout - Feature cutout for data augmentation and regularization

🏗️ Specialized Architectures (8 layers)

Advanced specialized layers for specific use cases including gated networks, boosting, business rules, and ensemble methods.

GatedResidualNetwork - Gated residual network with improved gradient flow
GatedLinearUnit - Gated linear transformation
TabularMoELayer - Mixture of Experts for adaptive expert selection
BoostingBlock - Gradient boosting inspired neural block
BoostingEnsembleLayer - Ensemble of boosting blocks
BusinessRulesLayer - Domain-specific business rules integration
StochasticDepth - Stochastic depth regularization
SlowNetwork - Careful feature processing with controlled information flow

🛠️ Utility & Graph Processing (8 layers)

Essential utility layers for data processing, graph operations, and anomaly detection.

GraphFeatureAggregation - Graph feature aggregation for relational learning
AdvancedGraphFeatureLayer - Advanced graph feature processing
MultiHeadGraphFeaturePreprocessor - Multi-head graph preprocessing
NumericalAnomalyDetection - Statistical anomaly detection for numerical features
CategoricalAnomalyDetectionLayer - Pattern-based anomaly detection for categorical features
HyperZZWOperator - Hyperparameter-aware operator for adaptive behavior

📋 Complete API Reference

⏱️ Time Series & Forecasting (16 layers)

Specialized layers for time series analysis, forecasting, and pattern recognition with advanced decomposition and mixing strategies.

📍 PositionalEmbedding

PositionalEmbedding(max_len, embedding_dim)

Sinusoidal positional encoding for sequence models and transformers.

Use when: You need position information in transformer models

🔧 FixedEmbedding

FixedEmbedding(num_embeddings, embedding_dim)

Non-trainable sinusoidal embeddings for temporal indices (months, days, hours).

Use when: You want fixed cyclical embeddings for temporal features

🎫 TokenEmbedding

TokenEmbedding(c_in, d_model, conv_kernel_size)

1D convolution-based embedding for time series values.

Use when: You need learnable embeddings for raw time series values

⏰ TemporalEmbedding

TemporalEmbedding(d_model, embed_type, freq)

Embedding layer for temporal features like month, day, weekday, hour, minute.

Use when: You have temporal feature information to encode

🎯 DataEmbeddingWithoutPosition

DataEmbeddingWithoutPosition(c_in, d_model, embedding_type, freq, dropout)

Combined token and temporal embedding for comprehensive feature representation.

Use when: You want unified embeddings for both values and temporal features

🏃 MovingAverage

MovingAverage(kernel_size)

Trend extraction using moving average filtering for time series.

Use when: You need to separate trends from seasonal components

🔀 SeriesDecomposition

SeriesDecomposition(kernel_size)

Trend-seasonal decomposition using moving average filtering.

Use when: You want explicit decomposition of time series components

📊 DFTSeriesDecomposition

DFTSeriesDecomposition()

Frequency-based series decomposition using Discrete Fourier Transform.

Use when: You prefer frequency-domain decomposition

🔄 ReversibleInstanceNorm

ReversibleInstanceNorm(eps, subtract_last)

Reversible instance normalization with optional denormalization for time series.

Use when: You need reversible normalization for stable training

🏗️ ReversibleInstanceNormMultivariate

ReversibleInstanceNormMultivariate(eps)

Multivariate version of reversible instance normalization.

Use when: You have multivariate time series data

🌊 MultiScaleSeasonMixing

MultiScaleSeasonMixing(seq_len, down_sampling_window, d_model)

Bottom-up multi-scale seasonal pattern mixing with hierarchical aggregation.

Use when: You want to capture seasonal patterns at multiple scales

📈 MultiScaleTrendMixing

MultiScaleTrendMixing(seq_len, down_sampling_window, d_model)

Top-down multi-scale trend pattern mixing with hierarchical decomposition.

Use when: You want to capture trend patterns at multiple scales

🔀 PastDecomposableMixing

PastDecomposableMixing(seq_len, pred_len, d_model, decomp_method, down_sampling_window)

Past decomposable mixing encoder combining decomposition and multi-scale mixing.

Use when: You need comprehensive decomposition with multi-scale mixing

⏱️ TemporalMixing

TemporalMixing(seq_len, d_model, hidden_dim, dropout)

MLP-based temporal mixing for TSMixer that applies transformations across time.

Use when: You want lightweight temporal pattern learning

🔀 FeatureMixing

FeatureMixing(d_model, ff_dim, dropout)

Feed-forward feature mixing learning cross-series correlations.

Use when: You want to capture dependencies between time series

🔀 MixingLayer

MixingLayer(seq_len, d_model, hidden_dim, ff_dim, dropout)

Core mixing block combining TemporalMixing and FeatureMixing for TSMixer.

Use when: You need dual-perspective temporal and feature learning

🎯 Feature Selection & Gating (5 layers)

Layers for dynamic feature selection, gating mechanisms, and feature fusion.

🔀 VariableSelection

VariableSelection(nr_features, units, dropout_rate)

Dynamic feature selection using gated residual networks with optional context conditioning.

Use when: You need automatic feature importance learning during training

🚪 GatedFeatureSelection

GatedFeatureSelection(units, dropout_rate)

Feature selection layer using gating mechanisms for conditional feature routing.

Use when: You want learnable adaptive feature importance

🌊 GatedFeatureFusion

GatedFeatureFusion(hidden_dim, dropout)

Combines and fuses features using gated mechanisms for adaptive integration.

Use when: You need to intelligently combine multiple feature representations

📍 GatedLinearUnit

GatedLinearUnit(units)

Gated linear transformation for controlling information flow.

Use when: You need selective information flow in your model

🔗 GatedResidualNetwork

GatedResidualNetwork(units, dropout_rate)

Gated residual network architecture with improved gradient flow.

Use when: You need robust feature processing with residual connections

👁️ Attention Mechanisms (6 layers)

Advanced attention layers for capturing complex feature and sample relationships.

🎯 TabularAttention

TabularAttention(num_heads, key_dim, dropout)

Dual attention mechanism for inter-feature and inter-sample relationships.

Use when: You have complex feature interactions to discover

📊 MultiResolutionTabularAttention

MultiResolutionTabularAttention(num_heads, key_dim, dropout)

Multi-resolution attention for numerical and categorical features.

Use when: You have mixed feature types needing different processing

🔍 InterpretableMultiHeadAttention

InterpretableMultiHeadAttention(num_heads, key_dim, dropout)

Multi-head attention with explainability features.

Use when: You need to understand attention patterns

🧠 TransformerBlock

TransformerBlock(dim_model, num_heads, ff_units, dropout)

Complete transformer block with self-attention and feed-forward.

Use when: You want standard transformer architecture for tabular data

📌 ColumnAttention

ColumnAttention(hidden_dim, dropout)

Column-wise attention for feature relationships.

Use when: You want to focus on feature-level interactions

📍 RowAttention

RowAttention(hidden_dim, dropout)

Row-wise attention for sample relationships.

Use when: You want to capture sample-level patterns

📊 Data Preprocessing & Transformation (9 layers)

Essential preprocessing layers for data preparation and transformation.

🔄 DistributionTransformLayer

DistributionTransformLayer(transform_type, epsilon, method)

Automatic distribution transformation for improved analysis.

Use when: You have skewed distributions that need normalization

🎓 DistributionAwareEncoder

DistributionAwareEncoder(encoding_dim, dropout, detection_method)

Distribution-aware feature encoding with auto-detection.

Use when: You need adaptive encoding based on data distributions

📈 AdvancedNumericalEmbedding

AdvancedNumericalEmbedding(embedding_dim, num_bins, hidden_dim)

Advanced numerical embedding with dual-branch architecture.

Use when: You want rich numerical feature representations

📅 DateParsingLayer

DateParsingLayer(date_formats, default_format)

Flexible date parsing from various formats.

Use when: You have date/time features to parse

🕐 DateEncodingLayer

DateEncodingLayer(min_year, max_year)

Cyclical date feature encoding.

Use when: You want cyclical representations of temporal features

🌙 SeasonLayer

SeasonLayer()

Seasonal feature extraction for temporal patterns.

Use when: Your data has seasonal patterns

🔀 DifferentialPreprocessingLayer

DifferentialPreprocessingLayer(transform_types, dropout)

Multiple transformations with learnable combination.

Use when: You want the model to learn optimal preprocessing

🔧 DifferentiableTabularPreprocessor

DifferentiableTabularPreprocessor(imputation_strategy, normalization, dropout)

End-to-end differentiable preprocessing.

Use when: You want learnable imputation and normalization

🎨 CastToFloat32Layer

CastToFloat32Layer()

Type casting utility for float32 precision.

Use when: You need to ensure consistent data types

⚙️ Feature Engineering & Selection (5 layers)

Advanced feature engineering and selection layers.

🧬 GraphFeatureAggregation

GraphFeatureAggregation(aggregation_method, hidden_dim, dropout)

Graph feature aggregation for relational learning.

Use when: You have feature relationships to model

🎯 SparseAttentionWeighting

SparseAttentionWeighting(temperature, dropout, sparsity_threshold)

Sparse attention for efficient computation.

Use when: You need memory-efficient attention

🗑️ FeatureCutout

FeatureCutout(cutout_prob, noise_value, training_only)

Feature cutout for data augmentation.

Use when: You want to improve model robustness through augmentation

🏗️ Specialized Architectures (8 layers)

Advanced specialized layers for specific use cases.

📈 BoostingBlock

BoostingBlock(hidden_units, hidden_activation, gamma_trainable)

Gradient boosting inspired neural block.

Use when: You want boosting-like behavior in neural networks

🎯 BoostingEnsembleLayer

BoostingEnsembleLayer(num_learners, learner_units, hidden_activation)

Ensemble of boosting blocks.

Use when: You want ensemble-based learning

🏗️ BusinessRulesLayer

BusinessRulesLayer(rules, feature_type, trainable_weights)

Domain-specific business rules integration.

Use when: You need to enforce domain constraints

🐢 SlowNetwork

SlowNetwork(hidden_units, num_layers, activation, dropout)

Careful feature processing with controlled flow.

Use when: You want deliberate, well-controlled processing

⚡ HyperZZWOperator

HyperZZWOperator(hidden_units, hyperparameter_dim, activation)

Hyperparameter-aware operator for adaptive behavior.

Use when: You want dynamic hyperparameter adjustment

📊 TabularMoELayer

TabularMoELayer(num_experts, expert_units)

Mixture of Experts for tabular data.

Use when: You have diverse data requiring different expert processing

🎲 StochasticDepth

StochasticDepth(survival_prob, scale_at_test)

Stochastic depth regularization.

Use when: You want improved generalization in deep networks

🛠️ Utility & Graph Processing (8 layers)

Utility layers for data processing, graph operations, and anomaly detection.

🧬 AdvancedGraphFeatureLayer

AdvancedGraphFeatureLayer(hidden_dim, num_heads, dropout, use_attention)

Advanced graph feature processing with dynamic learning.

Use when: You have complex feature relationships

👥 MultiHeadGraphFeaturePreprocessor

MultiHeadGraphFeaturePreprocessor(num_heads, hidden_dim, dropout, aggregation)

Multi-head graph preprocessing.

Use when: You want parallel feature processing

📉 NumericalAnomalyDetection

NumericalAnomalyDetection(method, contamination, threshold)

Statistical anomaly detection for numerical features.

Use when: You need to detect numerical outliers

📊 CategoricalAnomalyDetectionLayer

CategoricalAnomalyDetectionLayer(method, threshold, min_frequency)

Pattern-based anomaly detection for categorical features.

Use when: You need to detect categorical anomalies

🚀 Quick Start Guide

Getting Started with KerasFactory Layers

  **Step 1: Choose Your Base Layer**
  - Start with `DifferentiableTabularPreprocessor` for data preparation
  - Add `VariableSelection` for feature importance

  **Step 2: Add Attention**
  - Use `TabularAttention` to capture feature relationships

  **Step 3: Build Your Model**
  - Stack layers together for powerful architectures

  **Example:**
  import keras
from kerasfactory.layers import TabularAttention, VariableSelection

inputs = keras.Input(shape=(10,))
x = VariableSelection(nr_features=10, units=64)(inputs)
x = TabularAttention(num_heads=4, key_dim=32)(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

📖 For More Information

API Reference - Detailed API documentation with autodoc references
Contributing - How to contribute new layers
Examples - Real-world usage examples
Tutorials - Step-by-step guides