Decomposable Multi-Scale Mixing for Time Series Forecasting
Overview
TimeMixer is a state-of-the-art time series forecasting model that uses series decomposition and multi-scale mixing to capture both trend and seasonal patterns. It employs a decomposable architecture that separates trend and seasonal components, then applies multi-scale mixing operations to learn complex temporal patterns.
Key Features
Series Decomposition: Separates trend and seasonal components using moving average or DFT decomposition
Multi-Scale Mixing: Captures patterns at different time scales through downsampling layers
Reversible Instance Normalization: Optional normalization for improved training stability
Channel Independence: Supports both channel-dependent and channel-independent processing
Flexible Architecture: Configurable encoder layers, downsampling, and decomposition methods
Efficient: Designed for multivariate time series forecasting with linear complexity
Parameters
seq_len (int): Input sequence length (number of lookback steps). Must be positive.
pred_len (int): Prediction horizon (forecast length). Must be positive.
n_features (int): Number of time series features. Must be positive.
d_model (int, default=32): Model dimension (hidden size).
# With downsamplingmodel_downsample=TimeMixer(seq_len=96,pred_len=12,n_features=7,down_sampling_layers=2,down_sampling_window=4,down_sampling_method='avg')
Serialization
1 2 3 4 5 6 7 8 9101112
# Save modelmodel.save('timemixer_model.keras')# Load modelloaded_model=keras.models.load_model('timemixer_model.keras')# Save weights onlymodel.save_weights('timemixer_weights.h5')# Load weightsmodel_new=TimeMixer(seq_len=96,pred_len=12,n_features=7)model_new.load_weights('timemixer_weights.h5')
Best Use Cases
Multivariate Time Series Forecasting: Multiple related time series with trend and seasonal patterns
Long-Horizon Forecasting: Effective for longer prediction horizons
Complex Temporal Patterns: Captures both trend and seasonal components
Multi-Scale Patterns: Handles patterns at different time scales through downsampling
Production Systems: Efficient inference with optional normalization
Performance Considerations
seq_len: Larger values capture longer-term dependencies but increase computation
e_layers: More encoder layers improve capacity but increase training time
d_model: Larger dimensions improve expressiveness but increase parameters
decomp_method: Moving average is faster, DFT may capture more complex patterns
down_sampling_layers: More layers capture multi-scale patterns but increase complexity
use_norm: Instance normalization improves training stability, especially for non-stationary data
Comparison with Other Architectures
vs. TSMixer
Advantage: Multi-scale mixing and decomposition for better pattern capture
Disadvantage: More complex architecture with more hyperparameters
vs. Transformers
Advantage: More efficient, explicit decomposition of trend/seasonal
Disadvantage: May not capture very long-range dependencies as well
vs. NLinear/DLinear
Advantage: Multi-scale mixing and flexible decomposition methods
Disadvantage: More parameters and complexity
Notes
Reversible Instance Normalization (RevIN) is enabled by default and helps with non-stationary data
Series decomposition separates trend and seasonal components for better pattern learning
Multi-scale downsampling captures patterns at different temporal resolutions