Anomaly Detection Model with Optional Preprocessing Integration
Overview
Autoencoder is a neural network model designed for anomaly detection. It learns to reconstruct normal patterns and identifies anomalies as data points with high reconstruction error. The model can optionally integrate with preprocessing models for production use, making it a unified solution for both training and inference.
Key Features
Anomaly Detection: Identifies anomalies through reconstruction error
Adaptive Threshold: Learns threshold based on training data distribution
Preprocessing Integration: Optional preprocessing model for unified pipelines
Flexible Architecture: Configurable encoding and intermediate dimensions
Production Ready: Supports preprocessing models for deployment
Statistical Metrics: Tracks median and standard deviation of anomaly scores
Parameters
input_dim (int): Dimension of the input data. Must be positive.
encoding_dim (int, default=64): Dimension of the encoded representation.
intermediate_dim (int, default=32): Dimension of the intermediate layer.
threshold (float, default=2.0): Initial threshold for anomaly detection.
# Create model with custom thresholdmodel=Autoencoder(input_dim=32,encoding_dim=16,threshold=3.0# Higher threshold = fewer anomalies detected)# Or update threshold after trainingmodel.update_threshold(2.5)
With Preprocessing Model
1 2 3 4 5 6 7 8 91011121314151617
fromkerasfactory.utils.data_analyzerimportDataAnalyzerimportpandasaspd# Create preprocessing modeldf=pd.DataFrame(np.random.randn(1000,32))analyzer=DataAnalyzer(df)preprocessing_model=analyzer.create_preprocessing_model()# Create model with preprocessingmodel=Autoencoder(input_dim=32,encoding_dim=16,preprocessing_model=preprocessing_model)# Trainmodel.fit(X_train,X_train,epochs=50)
Different Architectures
1 2 3 4 5 6 7 8 9101112131415161718192021
# Small bottleneck (more compression)model_small=Autoencoder(input_dim=32,encoding_dim=8,# Smaller encodingintermediate_dim=4)# Large bottleneck (less compression)model_large=Autoencoder(input_dim=32,encoding_dim=24,# Larger encodingintermediate_dim=16)# Deep architecture (more layers)# Note: You may need to modify the model to add more layersmodel_deep=Autoencoder(input_dim=32,encoding_dim=16,intermediate_dim=8)
Evaluation Metrics
1 2 3 4 5 6 7 8 91011121314151617181920
importkeras# Create metricsaccuracy_metric=keras.metrics.BinaryAccuracy()precision_metric=keras.metrics.Precision()recall_metric=keras.metrics.Recall()# Get predictionsanomaly_results=model.is_anomaly(test_data)predicted_anomalies=anomaly_results['anomaly'].numpy().astype(np.float32)# Update metricstest_labels=(test_labels>0).astype(np.float32)# Convert to binaryaccuracy_metric.update_state(test_labels,predicted_anomalies)precision_metric.update_state(test_labels,predicted_anomalies)recall_metric.update_state(test_labels,predicted_anomalies)print(f"Accuracy: {accuracy_metric.result().numpy()}")print(f"Precision: {precision_metric.result().numpy()}")print(f"Recall: {recall_metric.result().numpy()}")
Serialization
1 2 3 4 5 6 7 8 9101112
# Save modelmodel.save('autoencoder_model.keras')# Load modelloaded_model=keras.models.load_model('autoencoder_model.keras')# Save weights onlymodel.save_weights('autoencoder_weights.h5')# Load weightsmodel_new=Autoencoder(input_dim=32,encoding_dim=16)model_new.load_weights('autoencoder_weights.h5')
Best Use Cases
Anomaly Detection: Identifying outliers in normal data patterns
Fraud Detection: Detecting fraudulent transactions or activities
Quality Control: Identifying defective products or processes
Network Security: Detecting intrusions or unusual network behavior
Production Monitoring: Detecting anomalies in production systems
Performance Considerations
encoding_dim: Smaller values create stronger compression but may lose important information
intermediate_dim: Affects model capacity and reconstruction quality