Skip to content

🧩 Layers - Complete Reference & Explorer

36+ production-ready layers designed exclusively for Keras 3.
Build sophisticated tabular models with advanced attention, feature engineering, and preprocessing layers.
36+ Production Layers
8 Categories
100% Keras 3 Native
0% TensorFlow Lock-in

🎯 Why Use KerasFactory Layers?

Challenge Traditional Approach KerasFactory's Solution
🔗 Feature Interactions Manual feature crosses 👁️ Tabular Attention - Automatic relationship discovery
🏷️ Mixed Feature Types Uniform processing 🧩 Feature-wise Layers - Specialized processing per feature
📊 Complex Distributions Fixed strategies 📊 Distribution-Aware Encoding - Adaptive transformations
Performance Optimization Post-hoc analysis 🎯 Built-in Selection - Learned during training
🔒 Production Readiness Extra tooling needed Battle-Tested - Used in production models

✨ Key Features

👁️

Attention Mechanisms

Automatically discover feature relationships and sample importance with advanced attention layers.

🧩

Feature-wise Processing

Each feature receives specialized processing through mixture of experts and dedicated layers.

📊

Distribution-Aware

Automatically adapt to different distributions with intelligent encoding and transformations.

Performance Ready

Optimized for production with built-in regularization and efficient memory usage.

🎯

Built-in Optimization

Learn which features matter during training, not after with integrated feature selection.

🔒

Production Proven

Battle-tested in real-world ML pipelines with comprehensive testing and documentation.

💡 Pro Tip: Start with TabularAttention for feature relationships, VariableSelection for feature importance, and DifferentiableTabularPreprocessor for end-to-end preprocessing. Combine them for powerful custom architectures.

🔍 Interactive Layers Explorer

🔍 Smart Search & Advanced Filtering
Showing all 36+ layers

📚 All Layers by Category

⏱️ Time Series & Forecasting (16 layers)

Specialized layers for time series forecasting, decomposition, and feature extraction with multi-scale pattern recognition.

  • PositionalEmbedding - Sinusoidal positional encoding for sequence models
  • FixedEmbedding - Non-trainable embeddings for temporal indices (months, days, hours)
  • TokenEmbedding - 1D convolution-based embedding for time series values
  • TemporalEmbedding - Embedding layer for temporal features (month, day, weekday, hour, minute)
  • DataEmbeddingWithoutPosition - Combined token and temporal embedding for comprehensive features
  • MovingAverage - Trend extraction using moving average filtering
  • SeriesDecomposition - Trend-seasonal decomposition using moving average
  • DFTSeriesDecomposition - Frequency-based decomposition using Discrete Fourier Transform
  • ReversibleInstanceNorm - Reversible instance normalization with optional denormalization
  • ReversibleInstanceNormMultivariate - Multivariate reversible instance normalization
  • MultiScaleSeasonMixing - Bottom-up multi-scale seasonal pattern mixing
  • MultiScaleTrendMixing - Top-down multi-scale trend pattern mixing
  • PastDecomposableMixing - Decomposable mixing encoder combining decomposition and multi-scale mixing
  • TemporalMixing - MLP-based temporal mixing for TSMixer architecture
  • FeatureMixing - Feed-forward feature mixing for cross-series correlations
  • MixingLayer - Core mixing block combining temporal and feature mixing

🧠 Attention Mechanisms (6 layers)

Advanced attention layers for capturing complex feature relationships and dependencies in tabular data.

  • TabularAttention - Dual attention mechanism for inter-feature and inter-sample relationships
  • MultiResolutionTabularAttention - Multi-resolution attention for different feature scales
  • InterpretableMultiHeadAttention - Multi-head attention with explainability features
  • TransformerBlock - Standard transformer block with self-attention and feed-forward
  • ColumnAttention - Column-wise attention for feature relationships
  • RowAttention - Row-wise attention for sample relationships

🔧 Data Preprocessing & Transformation (9 layers)

Essential preprocessing layers for data cleaning, transformation, and preparation for optimal model performance.

  • DifferentiableTabularPreprocessor - End-to-end differentiable preprocessing with learnable imputation
  • DifferentialPreprocessingLayer - Multiple candidate transformations with learnable combination
  • DateParsingLayer - Flexible date parsing from various formats
  • DateEncodingLayer - Cyclical date feature encoding
  • SeasonLayer - Seasonal feature extraction
  • DistributionTransformLayer - Automatic distribution transformation
  • DistributionAwareEncoder - Distribution-aware feature encoding
  • CastToFloat32Layer - Type casting utility
  • AdvancedNumericalEmbedding - Advanced numerical embedding with dual-branch architecture

⚙️ Feature Engineering & Selection (5 layers)

Intelligent feature engineering and selection layers for identifying important features and creating powerful representations.

  • VariableSelection - Intelligent variable selection using gated residual networks
  • GatedFeatureSelection - Learnable feature selection with gating
  • GatedFeatureFusion - Gated mechanism for feature fusion
  • SparseAttentionWeighting - Sparse attention for efficient computation
  • FeatureCutout - Feature cutout for data augmentation and regularization

🏗️ Specialized Architectures (8 layers)

Advanced specialized layers for specific use cases including gated networks, boosting, business rules, and ensemble methods.

  • GatedResidualNetwork - Gated residual network with improved gradient flow
  • GatedLinearUnit - Gated linear transformation
  • TabularMoELayer - Mixture of Experts for adaptive expert selection
  • BoostingBlock - Gradient boosting inspired neural block
  • BoostingEnsembleLayer - Ensemble of boosting blocks
  • BusinessRulesLayer - Domain-specific business rules integration
  • StochasticDepth - Stochastic depth regularization
  • SlowNetwork - Careful feature processing with controlled information flow

🛠️ Utility & Graph Processing (8 layers)

Essential utility layers for data processing, graph operations, and anomaly detection.

  • GraphFeatureAggregation - Graph feature aggregation for relational learning
  • AdvancedGraphFeatureLayer - Advanced graph feature processing
  • MultiHeadGraphFeaturePreprocessor - Multi-head graph preprocessing
  • NumericalAnomalyDetection - Statistical anomaly detection for numerical features
  • CategoricalAnomalyDetectionLayer - Pattern-based anomaly detection for categorical features
  • HyperZZWOperator - Hyperparameter-aware operator for adaptive behavior

📋 Complete API Reference

⏱️ Time Series & Forecasting (16 layers)

Specialized layers for time series analysis, forecasting, and pattern recognition with advanced decomposition and mixing strategies.

📍 PositionalEmbedding

PositionalEmbedding(max_len, embedding_dim)

Sinusoidal positional encoding for sequence models and transformers.

Use when: You need position information in transformer models

🔧 FixedEmbedding

FixedEmbedding(num_embeddings, embedding_dim)

Non-trainable sinusoidal embeddings for temporal indices (months, days, hours).

Use when: You want fixed cyclical embeddings for temporal features

🎫 TokenEmbedding

TokenEmbedding(c_in, d_model, conv_kernel_size)

1D convolution-based embedding for time series values.

Use when: You need learnable embeddings for raw time series values

⏰ TemporalEmbedding

TemporalEmbedding(d_model, embed_type, freq)

Embedding layer for temporal features like month, day, weekday, hour, minute.

Use when: You have temporal feature information to encode

🎯 DataEmbeddingWithoutPosition

DataEmbeddingWithoutPosition(c_in, d_model, embedding_type, freq, dropout)

Combined token and temporal embedding for comprehensive feature representation.

Use when: You want unified embeddings for both values and temporal features

🏃 MovingAverage

MovingAverage(kernel_size)

Trend extraction using moving average filtering for time series.

Use when: You need to separate trends from seasonal components

🔀 SeriesDecomposition

SeriesDecomposition(kernel_size)

Trend-seasonal decomposition using moving average filtering.

Use when: You want explicit decomposition of time series components

📊 DFTSeriesDecomposition

DFTSeriesDecomposition()

Frequency-based series decomposition using Discrete Fourier Transform.

Use when: You prefer frequency-domain decomposition

🔄 ReversibleInstanceNorm

ReversibleInstanceNorm(eps, subtract_last)

Reversible instance normalization with optional denormalization for time series.

Use when: You need reversible normalization for stable training

🏗️ ReversibleInstanceNormMultivariate

ReversibleInstanceNormMultivariate(eps)

Multivariate version of reversible instance normalization.

Use when: You have multivariate time series data

🌊 MultiScaleSeasonMixing

MultiScaleSeasonMixing(seq_len, down_sampling_window, d_model)

Bottom-up multi-scale seasonal pattern mixing with hierarchical aggregation.

Use when: You want to capture seasonal patterns at multiple scales

📈 MultiScaleTrendMixing

MultiScaleTrendMixing(seq_len, down_sampling_window, d_model)

Top-down multi-scale trend pattern mixing with hierarchical decomposition.

Use when: You want to capture trend patterns at multiple scales

🔀 PastDecomposableMixing

PastDecomposableMixing(seq_len, pred_len, d_model, decomp_method, down_sampling_window)

Past decomposable mixing encoder combining decomposition and multi-scale mixing.

Use when: You need comprehensive decomposition with multi-scale mixing

⏱️ TemporalMixing

TemporalMixing(seq_len, d_model, hidden_dim, dropout)

MLP-based temporal mixing for TSMixer that applies transformations across time.

Use when: You want lightweight temporal pattern learning

🔀 FeatureMixing

FeatureMixing(d_model, ff_dim, dropout)

Feed-forward feature mixing learning cross-series correlations.

Use when: You want to capture dependencies between time series

🔀 MixingLayer

MixingLayer(seq_len, d_model, hidden_dim, ff_dim, dropout)

Core mixing block combining TemporalMixing and FeatureMixing for TSMixer.

Use when: You need dual-perspective temporal and feature learning

🎯 Feature Selection & Gating (5 layers)

Layers for dynamic feature selection, gating mechanisms, and feature fusion.

🔀 VariableSelection

VariableSelection(nr_features, units, dropout_rate)

Dynamic feature selection using gated residual networks with optional context conditioning.

Use when: You need automatic feature importance learning during training

🚪 GatedFeatureSelection

GatedFeatureSelection(units, dropout_rate)

Feature selection layer using gating mechanisms for conditional feature routing.

Use when: You want learnable adaptive feature importance

🌊 GatedFeatureFusion

GatedFeatureFusion(hidden_dim, dropout)

Combines and fuses features using gated mechanisms for adaptive integration.

Use when: You need to intelligently combine multiple feature representations

📍 GatedLinearUnit

GatedLinearUnit(units)

Gated linear transformation for controlling information flow.

Use when: You need selective information flow in your model

🔗 GatedResidualNetwork

GatedResidualNetwork(units, dropout_rate)

Gated residual network architecture with improved gradient flow.

Use when: You need robust feature processing with residual connections

👁️ Attention Mechanisms (6 layers)

Advanced attention layers for capturing complex feature and sample relationships.

🎯 TabularAttention

TabularAttention(num_heads, key_dim, dropout)

Dual attention mechanism for inter-feature and inter-sample relationships.

Use when: You have complex feature interactions to discover

📊 MultiResolutionTabularAttention

MultiResolutionTabularAttention(num_heads, key_dim, dropout)

Multi-resolution attention for numerical and categorical features.

Use when: You have mixed feature types needing different processing

🔍 InterpretableMultiHeadAttention

InterpretableMultiHeadAttention(num_heads, key_dim, dropout)

Multi-head attention with explainability features.

Use when: You need to understand attention patterns

🧠 TransformerBlock

TransformerBlock(dim_model, num_heads, ff_units, dropout)

Complete transformer block with self-attention and feed-forward.

Use when: You want standard transformer architecture for tabular data

📌 ColumnAttention

ColumnAttention(hidden_dim, dropout)

Column-wise attention for feature relationships.

Use when: You want to focus on feature-level interactions

📍 RowAttention

RowAttention(hidden_dim, dropout)

Row-wise attention for sample relationships.

Use when: You want to capture sample-level patterns

📊 Data Preprocessing & Transformation (9 layers)

Essential preprocessing layers for data preparation and transformation.

🔄 DistributionTransformLayer

DistributionTransformLayer(transform_type, epsilon, method)

Automatic distribution transformation for improved analysis.

Use when: You have skewed distributions that need normalization

🎓 DistributionAwareEncoder

DistributionAwareEncoder(encoding_dim, dropout, detection_method)

Distribution-aware feature encoding with auto-detection.

Use when: You need adaptive encoding based on data distributions

📈 AdvancedNumericalEmbedding

AdvancedNumericalEmbedding(embedding_dim, num_bins, hidden_dim)

Advanced numerical embedding with dual-branch architecture.

Use when: You want rich numerical feature representations

📅 DateParsingLayer

DateParsingLayer(date_formats, default_format)

Flexible date parsing from various formats.

Use when: You have date/time features to parse

🕐 DateEncodingLayer

DateEncodingLayer(min_year, max_year)

Cyclical date feature encoding.

Use when: You want cyclical representations of temporal features

🌙 SeasonLayer

SeasonLayer()

Seasonal feature extraction for temporal patterns.

Use when: Your data has seasonal patterns

🔀 DifferentialPreprocessingLayer

DifferentialPreprocessingLayer(transform_types, dropout)

Multiple transformations with learnable combination.

Use when: You want the model to learn optimal preprocessing

🔧 DifferentiableTabularPreprocessor

DifferentiableTabularPreprocessor(imputation_strategy, normalization, dropout)

End-to-end differentiable preprocessing.

Use when: You want learnable imputation and normalization

🎨 CastToFloat32Layer

CastToFloat32Layer()

Type casting utility for float32 precision.

Use when: You need to ensure consistent data types

⚙️ Feature Engineering & Selection (5 layers)

Advanced feature engineering and selection layers.

🧬 GraphFeatureAggregation

GraphFeatureAggregation(aggregation_method, hidden_dim, dropout)

Graph feature aggregation for relational learning.

Use when: You have feature relationships to model

🎯 SparseAttentionWeighting

SparseAttentionWeighting(temperature, dropout, sparsity_threshold)

Sparse attention for efficient computation.

Use when: You need memory-efficient attention

🗑️ FeatureCutout

FeatureCutout(cutout_prob, noise_value, training_only)

Feature cutout for data augmentation.

Use when: You want to improve model robustness through augmentation

🏗️ Specialized Architectures (8 layers)

Advanced specialized layers for specific use cases.

📈 BoostingBlock

BoostingBlock(hidden_units, hidden_activation, gamma_trainable)

Gradient boosting inspired neural block.

Use when: You want boosting-like behavior in neural networks

🎯 BoostingEnsembleLayer

BoostingEnsembleLayer(num_learners, learner_units, hidden_activation)

Ensemble of boosting blocks.

Use when: You want ensemble-based learning

🏗️ BusinessRulesLayer

BusinessRulesLayer(rules, feature_type, trainable_weights)

Domain-specific business rules integration.

Use when: You need to enforce domain constraints

🐢 SlowNetwork

SlowNetwork(hidden_units, num_layers, activation, dropout)

Careful feature processing with controlled flow.

Use when: You want deliberate, well-controlled processing

⚡ HyperZZWOperator

HyperZZWOperator(hidden_units, hyperparameter_dim, activation)

Hyperparameter-aware operator for adaptive behavior.

Use when: You want dynamic hyperparameter adjustment

📊 TabularMoELayer

TabularMoELayer(num_experts, expert_units)

Mixture of Experts for tabular data.

Use when: You have diverse data requiring different expert processing

🎲 StochasticDepth

StochasticDepth(survival_prob, scale_at_test)

Stochastic depth regularization.

Use when: You want improved generalization in deep networks

🛠️ Utility & Graph Processing (8 layers)

Utility layers for data processing, graph operations, and anomaly detection.

🧬 AdvancedGraphFeatureLayer

AdvancedGraphFeatureLayer(hidden_dim, num_heads, dropout, use_attention)

Advanced graph feature processing with dynamic learning.

Use when: You have complex feature relationships

👥 MultiHeadGraphFeaturePreprocessor

MultiHeadGraphFeaturePreprocessor(num_heads, hidden_dim, dropout, aggregation)

Multi-head graph preprocessing.

Use when: You want parallel feature processing

📉 NumericalAnomalyDetection

NumericalAnomalyDetection(method, contamination, threshold)

Statistical anomaly detection for numerical features.

Use when: You need to detect numerical outliers

📊 CategoricalAnomalyDetectionLayer

CategoricalAnomalyDetectionLayer(method, threshold, min_frequency)

Pattern-based anomaly detection for categorical features.

Use when: You need to detect categorical anomalies


🚀 Quick Start Guide

Getting Started with KerasFactory Layers

**Step 1: Choose Your Base Layer** - Start with `DifferentiableTabularPreprocessor` for data preparation - Add `VariableSelection` for feature importance **Step 2: Add Attention** - Use `TabularAttention` to capture feature relationships **Step 3: Build Your Model** - Stack layers together for powerful architectures **Example:**
1
2
3
4
5
6
7
8
9
import keras
from kerasfactory.layers import TabularAttention, VariableSelection

inputs = keras.Input(shape=(10,))
x = VariableSelection(nr_features=10, units=64)(inputs)
x = TabularAttention(num_heads=4, key_dim=32)(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)

📖 For More Information