π― DataEmbeddingWithoutPosition
π― DataEmbeddingWithoutPosition
π’ Beginner
β
Stable
β±οΈ Time Series
π― Overview
The DataEmbeddingWithoutPosition layer combines two complementary embedding modalities for time series data:
Token Embedding : Learns value representations from raw time series
Temporal Embedding : Encodes calendar/temporal features (month, day, weekday, hour, minute)
It's designed to fuse both value and temporal information in a single, learnable representation with integrated regularization through dropout.
Perfect for time series models that leverage both:
- Historical values (price, temperature, traffic volume)
- Temporal context (day-of-week effects, seasonal patterns, hourly patterns)
π How It Works
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 Input: Raw Time Series Values Input: Temporal Features
x: (batch, time, channels) x_mark: (batch, time, 5)
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββββ
β TokenEmbedding β βTemporalEmbedding β
β Conv1D learning β β Month/Day/Hour... β
ββββββββββ¬ββββββββββ βββββββββββ¬βββββββββββ
β β
βΌ βΌ
(batch, time, d_model) (batch, time, d_model)
β β
βββββββββββββββββ¬ββββββββββββββββ
βΌ
Element-wise Addition
β
βΌ
βββββββββββββββ
β Dropout β
β (optional) β
ββββββββ¬βββββββ
β
βΌ
Output: (batch, time, d_model)
π‘ Why Use This Layer?
Scenario
Challenge
Solution
Multi-Modal Data
Combining values + temporal
β¨ Unified embedding
Calendar Effects
Day-of-week, seasonal patterns
ποΈ Temporal feature support
Overfit Prevention
Training instability
π― Integrated dropout
Flexible Input
Optional temporal features
β‘ Handles both cases
End-to-End Learning
Manual feature engineering
π§ Learned representations
π Use Cases
Load Forecasting : Energy demand with hourly/seasonal patterns
Stock Price Prediction : Historical prices + trading hour patterns
Weather Forecasting : Temperature trends + time-of-day cycles
Traffic Prediction : Volume trends + weekday/rush hour effects
Retail Sales : Historical sales + holiday/weekend features
Healthcare : Patient metrics + time-of-day variations
IoT Sensors : Sensor readings + temporal context
π Quick Start
Basic Usage
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 import keras
from kerasfactory.layers import DataEmbeddingWithoutPosition
# Create the embedding layer
data_emb = DataEmbeddingWithoutPosition (
c_in = 7 , # Number of input features
d_model = 64 , # Output embedding dimension
dropout = 0.1 # Regularization
)
# Input data
x = keras . random . normal (( 32 , 96 , 7 )) # 96 timesteps, 7 features
x_mark = keras . random . uniform (
( 32 , 96 , 5 ),
minval = 0 ,
maxval = 24 ,
dtype = 'int32'
) # Temporal features
# Combine embeddings
output = data_emb ([ x , x_mark ])
print ( f "Input shape: { x . shape } " ) # (32, 96, 7)
print ( f "Temporal shape: { x_mark . shape } " ) # (32, 96, 5)
print ( f "Output shape: { output . shape } " ) # (32, 96, 64)
Without Temporal Features
# Layer automatically handles missing temporal info
output = data_emb ( x ) # Skips temporal embedding
print ( output . shape ) # (32, 96, 64)
In a Complete Forecasting Pipeline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 from kerasfactory.layers import DataEmbeddingWithoutPosition , PositionalEmbedding
def create_time_series_model ():
# Inputs
x_input = keras . Input ( shape = ( 96 , 7 )) # Values
x_mark_input = keras . Input ( shape = ( 96 , 5 )) # Temporal features
# Embed values + temporal features
x_emb = DataEmbeddingWithoutPosition (
c_in = 7 , d_model = 64 , dropout = 0.1
)([ x_input , x_mark_input ])
# Add positional encoding
pos_emb = PositionalEmbedding ( max_len = 96 , d_model = 64 )( x_emb )
x = x_emb + pos_emb
# Transformer blocks
x = keras . layers . MultiHeadAttention (
num_heads = 8 , key_dim = 8
)( x , x )
x = keras . layers . Dense ( 128 , activation = 'relu' )( x )
x = keras . layers . LayerNormalization ()( x )
# Forecast layer
outputs = keras . layers . Dense ( 7 )( x ) # Predict next values
return keras . Model ([ x_input , x_mark_input ], outputs )
model = create_time_series_model ()
model . compile ( optimizer = 'adam' , loss = 'mse' , metrics = [ 'mae' ])
With Custom Dropout Rates
1
2
3
4
5
6
7
8
9
10
11
12 # High dropout for noisy data
data_emb_high_dropout = DataEmbeddingWithoutPosition (
c_in = 8 , d_model = 96 , dropout = 0.3
)
# Low dropout for clean data
data_emb_low_dropout = DataEmbeddingWithoutPosition (
c_in = 8 , d_model = 96 , dropout = 0.05
)
output1 = data_emb_high_dropout ([ x , x_mark ])
output2 = data_emb_low_dropout ([ x , x_mark ])
Multi-Scale Embedding Ensemble
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 class MultiScaleEmbedding ( keras . layers . Layer ):
def __init__ ( self , c_in , d_model = 64 ):
super () . __init__ ()
self . embed1 = DataEmbeddingWithoutPosition ( c_in , d_model , dropout = 0.1 )
self . embed2 = DataEmbeddingWithoutPosition ( c_in , d_model // 2 , dropout = 0.1 )
self . embed3 = DataEmbeddingWithoutPosition ( c_in , d_model // 4 , dropout = 0.1 )
def call ( self , inputs ):
if isinstance ( inputs , list ):
x , x_mark = inputs
else :
x = inputs
x_mark = None
e1 = self . embed1 ([ x , x_mark ] if x_mark is not None else x )
e2 = self . embed2 ([ x , x_mark ] if x_mark is not None else x )
e3 = self . embed3 ([ x , x_mark ] if x_mark is not None else x )
# Concatenate or combine embeddings
return keras . layers . concatenate ([ e1 , e2 , e3 ], axis =- 1 )
π§ API Reference
DataEmbeddingWithoutPosition
kerasfactory . layers . DataEmbeddingWithoutPosition (
c_in : int ,
d_model : int ,
dropout : float = 0.0 ,
embed_type : str = 'fixed' ,
freq : str = 'h' ,
name : str | None = None ,
** kwargs : Any
)
Parameters
Parameter
Type
Default
Description
c_in
int
β
Number of input channels (features)
d_model
int
β
Output embedding dimension
dropout
float
0.0
Dropout rate for regularization
embed_type
str
'fixed'
Embedding type: 'fixed' or 'learned'
freq
str
'h'
Frequency for temporal features: 'h'(hourly), 't'(minute), 'd'(daily), etc.
name
str \| None
None
Optional layer name
Option 1 (with temporal): List of 2 tensors
x: (batch_size, time_steps, c_in) - Raw time series values
x_mark: (batch_size, time_steps, 5) - Temporal features [month, day, weekday, hour, minute]
Option 2 (values only): Single tensor
x: (batch_size, time_steps, c_in) - Raw time series values
Output Shape
(batch_size, time_steps, d_model) - Combined embeddings
Returns
Fused value and temporal embeddings with dropout applied
Time Complexity : O(time_steps Γ c_in Γ d_model + time_steps Γ d_model)
Space Complexity : O(c_in Γ d_model + temporal_vocab_sizes)
Trainable Parameters : ~c_in Γ d_model Γ kernel_size + embedding_params
Inference Speed : Fast, both embeddings computed in parallel
Memory Efficient : Shared d_model dimension for both inputs
π¨ Advanced Usage
Dynamic Dropout During Training
class AdaptiveDataEmbedding ( keras . layers . Layer ):
def __init__ ( self , c_in , d_model ):
super () . __init__ ()
self . data_emb = DataEmbeddingWithoutPosition ( c_in , d_model )
self . adaptive_dropout = keras . layers . Dropout ( 0.2 )
def call ( self , inputs , training = None ):
embeddings = self . data_emb ( inputs )
return self . adaptive_dropout ( embeddings , training = training )
With Layer Normalization
1
2
3
4
5
6
7
8
9
10
11
12
13 def create_normalized_embedding ():
inputs_x = keras . Input ( shape = ( 96 , 7 ))
inputs_mark = keras . Input ( shape = ( 96 , 5 ))
# Embed
x = DataEmbeddingWithoutPosition (
c_in = 7 , d_model = 64 , dropout = 0.1
)([ inputs_x , inputs_mark ])
# Normalize
x = keras . layers . LayerNormalization ( epsilon = 1e-6 )( x )
return keras . Model ([ inputs_x , inputs_mark ], x )
Conditional Temporal Features
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 def embed_with_optional_temporal ( x , x_mark = None ):
data_emb = DataEmbeddingWithoutPosition ( c_in = 7 , d_model = 64 )
if x_mark is not None :
return data_emb ([ x , x_mark ])
else :
# Use only value embeddings
return data_emb ( x )
# Usage
x = keras . random . normal (( 32 , 96 , 7 ))
x_mark = keras . random . uniform (( 32 , 96 , 5 ), minval = 0 , maxval = 24 , dtype = 'int32' )
output_with_temporal = embed_with_optional_temporal ( x , x_mark )
output_without_temporal = embed_with_optional_temporal ( x )
π Visual Representation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DataEmbeddingWithoutPosition β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β INPUT 1: Values INPUT 2: Temporal β
β (batch, time, c_in) (batch, time, 5) β
β β β β
β βΌ βΌ β
β ββββββββββββββββββ ββββββββββββββββββββ β
β β TokenEmbedding β βTemporalEmbedding β β
β β Conv1D β β Month/Day/... β β
β ββββββββββ¬ββββββββ ββββββββββ¬ββββββββββ β
β β β β
β (b, t, d_model) (b, t, d_model) β
β β β β
β βββββββββββββββββ¬ββββββββββββ β
β βΌ β
β Element-wise Addition β
β β β
β βΌ β
β βββββββββββββββ β
β β Dropout β β
β β (p=...) β β
β βββββββ¬ββββββββ β
β β β
β OUTPUT: (batch, time, d_model) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Best Practices
Matching Dimensions : Ensure d_model matches downstream layers
Dropout Tuning : 0.1-0.2 for most cases, increase for noisy data
Temporal Features Order : month β day β weekday β hour β minute
Embed Type Selection : 'fixed' for speed, 'learned' for accuracy
Frequency Setting : Match your data frequency (h/d/t)
Optional Features : Handles missing x_mark gracefully
Normalization : Consider LayerNorm after embedding
β οΈ Common Pitfalls
β Wrong temporal range : month should be 0-12, hour 0-23, minute 0-59
β Mismatched c_in : Using wrong feature count causes shape errors
β No dropout : Missing regularization can lead to overfitting
β Ignoring x_mark : Always provide temporal features when available
β Wrong freq setting : Mismatch between frequency and data sampling
β d_model too small : Underfitting with small embedding dimension
π References
Vaswani, A., et al. (2017). "Attention Is All You Need"
Zhou, H., et al. (2021). "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting"
Das, S., et al. (2023). "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers"
β
Serialization
# Get configuration
config = data_emb . get_config ()
# Save to file
import json
with open ( 'data_embedding_config.json' , 'w' ) as f :
json . dump ( config , f )
# Recreate from config
new_layer = DataEmbeddingWithoutPosition . from_config ( config )
π§ͺ Testing & Validation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 import keras
# Test 1: With temporal features
data_emb = DataEmbeddingWithoutPosition ( c_in = 7 , d_model = 64 , dropout = 0.1 )
x = keras . random . normal (( 32 , 96 , 7 ))
x_mark = keras . random . uniform (( 32 , 96 , 5 ), minval = 0 , maxval = 24 , dtype = 'int32' )
output = data_emb ([ x , x_mark ])
assert output . shape == ( 32 , 96 , 64 ), "Shape mismatch with temporal features"
# Test 2: Without temporal features
output_no_temporal = data_emb ( x )
assert output_no_temporal . shape == ( 32 , 96 , 64 ), "Shape mismatch without temporal"
# Test 3: Different batch sizes
x_small = keras . random . normal (( 1 , 96 , 7 ))
x_large = keras . random . normal (( 256 , 96 , 7 ))
assert data_emb ( x_small ) . shape [ 0 ] == 1
assert data_emb ( x_large ) . shape [ 0 ] == 256
# Test 4: Different time steps
x_short = keras . random . normal (( 32 , 24 , 7 ))
x_long = keras . random . normal (( 32 , 256 , 7 ))
assert data_emb ( x_short ) . shape [ 1 ] == 24
assert data_emb ( x_long ) . shape [ 1 ] == 256
print ( "β All validation tests passed!" )
Last Updated : 2025-11-04
Version : 1.0
Keras : 3.0+
Status : β
Production Ready