Feature engineering is the process of creating, transforming, and selecting features to improve model performance. KerasFactory provides specialized layers for this purpose.
Why Feature Engineering Matters
1 2 3 4 5 6 7 8 91011121314
importnumpyasnpimportpandasaspdfromkerasfactory.layersimportAdvancedNumericalEmbedding,DistributionAwareEncoder# Example: Raw features vs Engineered featuresraw_features=np.random.normal(0,1,(1000,10))# Raw features - limited representationprint("Raw features shape:",raw_features.shape)# Engineered features - richer representationembedding_layer=AdvancedNumericalEmbedding(embedding_dim=64)engineered_features=embedding_layer(raw_features)print("Engineered features shape:",engineered_features.shape)
🔢 Numerical Feature Processing
1. Advanced Numerical Embedding
Transform numerical features into rich embeddings:
Transform features to follow specific distributions:
1 2 3 4 5 6 7 8 91011121314151617
fromkerasfactory.layersimportDistributionTransformLayerdefcreate_distribution_transform(input_dim):"""Transform features to normal distribution."""inputs=keras.Input(shape=(input_dim,))# Distribution transformationx=DistributionTransformLayer(transform_type='auto',method='box-cox')(inputs)returnkeras.Model(inputs,x)# Usagetransform_model=create_distribution_transform(input_dim=20)
🏷️ Categorical Feature Handling
1. Date and Time Features
Process temporal features effectively:
1 2 3 4 5 6 7 8 9101112131415161718192021222324
fromkerasfactory.layersimportDateParsingLayer,DateEncodingLayer,SeasonLayerdefcreate_temporal_features():"""Create comprehensive temporal feature processing."""# Date parsingdate_parser=DateParsingLayer()# Date encodingdate_encoder=DateEncodingLayer(min_year=1900,max_year=2100)# Season extractionseason_layer=SeasonLayer()returndate_parser,date_encoder,season_layer# Usagedate_parser,date_encoder,season_layer=create_temporal_features()# Process date stringsdate_strings=['2023-01-15','2023-06-20','2023-12-25']parsed_dates=date_parser(date_strings)encoded_dates=date_encoder(parsed_dates)seasonal_features=season_layer(parsed_dates)
2. Text Preprocessing
Handle text features in tabular data:
1 2 3 4 5 6 7 8 9101112131415
fromkerasfactory.layersimportTextPreprocessingLayerdefcreate_text_preprocessing():"""Create text preprocessing pipeline."""text_preprocessor=TextPreprocessingLayer(max_length=100,vocab_size=10000,tokenizer='word')returntext_preprocessor# Usagetext_preprocessor=create_text_preprocessing()
# Start with basic preprocessingdefbasic_pipeline(inputs):x=DifferentiableTabularPreprocessor()(inputs)x=VariableSelection(hidden_dim=32)(x)returnx# Gradually add complexitydefadvanced_pipeline(inputs):x=DifferentiableTabularPreprocessor()(inputs)x=AdvancedNumericalEmbedding(embedding_dim=64)(x)x=VariableSelection(hidden_dim=64)(x)x=SparseAttentionWeighting()(x)returnx
2. Monitor Feature Importance
1 2 3 4 5 6 7 8 91011121314151617
# Use attention weights to understand feature importancedefcreate_interpretable_model(input_dim):inputs=keras.Input(shape=(input_dim,))# Attention layer with weightsx,attention_weights=TabularAttention(num_heads=8,key_dim=64,use_attention_weights=True)(inputs)returnkeras.Model(inputs,[x,attention_weights])# Get attention weightsmodel=create_interpretable_model(input_dim=20)outputs,attention_weights=model.predict(X_test)print("Attention weights shape:",attention_weights.shape)
# Validate feature engineering impactdefcompare_models(X_train,y_train,X_test,y_test):"""Compare models with and without feature engineering."""# Model 1: Raw featuresinputs1=keras.Input(shape=(X_train.shape[1],))x1=keras.layers.Dense(64,activation='relu')(inputs1)x1=keras.layers.Dense(32,activation='relu')(x1)outputs1=keras.layers.Dense(3,activation='softmax')(x1)model1=keras.Model(inputs1,outputs1)# Model 2: With feature engineeringinputs2=keras.Input(shape=(X_train.shape[1],))x2=DifferentiableTabularPreprocessor()(inputs2)x2=AdvancedNumericalEmbedding(embedding_dim=64)(x2)x2=VariableSelection(hidden_dim=64)(x2)x2=TabularAttention(num_heads=8,key_dim=64)(x2)outputs2=keras.layers.Dense(3,activation='softmax')(x2)model2=keras.Model(inputs2,outputs2)# Compile and train both modelsformodelin[model1,model2]:model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])model.fit(X_train,y_train,epochs=10,verbose=0)# Compare performancescore1=model1.evaluate(X_test,y_test,verbose=0)score2=model2.evaluate(X_test,y_test,verbose=0)print(f"Raw features accuracy: {score1[1]:.4f}")print(f"Engineered features accuracy: {score2[1]:.4f}")returnmodel1,model2# Usagemodel1,model2=compare_models(X_train,y_train,X_test,y_test)
📚 Next Steps
Model Building: Learn advanced model architectures
Examples: See real-world feature engineering applications
API Reference: Deep dive into layer parameters
Performance: Optimize your feature engineering pipeline