π NumericalAnomalyDetection
π NumericalAnomalyDetection
π― Overview
The NumericalAnomalyDetection layer learns a distribution for each numerical feature and outputs an anomaly score for each feature based on how far it deviates from the learned distribution. It uses a combination of mean, variance, and autoencoder reconstruction error to detect anomalies.
This layer is particularly powerful for identifying outliers in numerical data, providing a comprehensive approach that combines statistical and neural network-based anomaly detection methods.
π How It Works
The NumericalAnomalyDetection processes data through a multi-component anomaly detection system:
- Autoencoder Processing: Encodes and decodes features through a neural network
- Reconstruction Error: Computes reconstruction error for each feature
- Distribution Learning: Learns mean and variance for each feature
- Distribution Error: Computes distribution-based error
- Anomaly Scoring: Combines reconstruction and distribution errors
- Output Generation: Produces anomaly scores for each feature
graph TD
A[Input Features] --> B[Autoencoder Encoder]
B --> C[Autoencoder Decoder]
C --> D[Reconstruction Error]
A --> E[Distribution Learning]
E --> F[Mean Learning]
E --> G[Variance Learning]
F --> H[Distribution Error]
G --> H
D --> I[Anomaly Scoring]
H --> I
I --> J[Anomaly Scores]
style A fill:#e6f3ff,stroke:#4a86e8
style J fill:#e8f5e9,stroke:#66bb6a
style B fill:#fff9e6,stroke:#ffb74d
style C fill:#fff9e6,stroke:#ffb74d
style E fill:#f3e5f5,stroke:#9c27b0
style I fill:#e1f5fe,stroke:#03a9f4
π‘ Why Use This Layer?
| Challenge | Traditional Approach | NumericalAnomalyDetection's Solution |
|---|---|---|
| Outlier Detection | Statistical methods only | π― Combined approach with neural networks |
| Feature-Specific | Global anomaly detection | β‘ Per-feature anomaly scoring |
| Reconstruction Error | No reconstruction learning | π§ Autoencoder-based reconstruction error |
| Distribution Learning | Fixed distributions | π Learned distributions for each feature |
π Use Cases
- Outlier Detection: Identifying outliers in numerical features
- Data Quality: Ensuring data quality through anomaly detection
- Feature Analysis: Analyzing feature-level anomalies
- Autoencoder Applications: Using autoencoders for anomaly detection
- Distribution Learning: Learning feature distributions
π Quick Start
Basic Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
In a Sequential Model
1 2 3 4 5 6 7 8 9 10 11 | |
In a Functional Model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Advanced Configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
π API Reference
kerasfactory.layers.NumericalAnomalyDetection
Classes
NumericalAnomalyDetection
1 2 3 4 5 6 | |
Numerical anomaly detection layer for identifying outliers in numerical features.
This layer learns a distribution for each numerical feature and outputs an anomaly score for each feature based on how far it deviates from the learned distribution. The layer uses a combination of mean, variance, and autoencoder reconstruction error to detect anomalies.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Initialize the layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dims |
list[int]
|
List of hidden dimensions for the autoencoder. |
required |
reconstruction_weight |
float
|
Weight for reconstruction error in anomaly score. |
0.5
|
distribution_weight |
float
|
Weight for distribution-based error in anomaly score. |
0.5
|
**kwargs |
dict[str, Any]
|
Additional keyword arguments. |
{}
|
Source code in kerasfactory/layers/NumericalAnomalyDetection.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
Functions
1 2 3 | |
Compute output shape.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_shape |
tuple[int, ...]
|
Input shape tuple. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, ...]
|
Output shape tuple. |
Source code in kerasfactory/layers/NumericalAnomalyDetection.py
134 135 136 137 138 139 140 141 142 143 | |
π§ Parameters Deep Dive
hidden_dims (list)
- Purpose: List of hidden dimensions for the autoencoder
- Range: [4, 2] to [128, 64, 32] (typically [16, 8] or [32, 16])
- Impact: Larger values = more complex autoencoder but more parameters
- Recommendation: Start with [16, 8], scale based on data complexity
reconstruction_weight (float)
- Purpose: Weight for reconstruction error in anomaly score
- Range: 0.0 to 1.0 (typically 0.3-0.7)
- Impact: Higher values = more emphasis on reconstruction error
- Recommendation: Use 0.3-0.7 based on data characteristics
distribution_weight (float)
- Purpose: Weight for distribution-based error in anomaly score
- Range: 0.0 to 1.0 (typically 0.3-0.7)
- Impact: Higher values = more emphasis on distribution error
- Recommendation: Use 0.3-0.7, should sum to 1.0 with reconstruction_weight
π Performance Characteristics
- Speed: β‘β‘β‘ Fast for small to medium models, scales with hidden dimensions
- Memory: πΎπΎπΎ Moderate memory usage due to autoencoder
- Accuracy: π―π―π―π― Excellent for numerical anomaly detection
- Best For: Numerical data with potential outliers
π¨ Examples
Example 1: Outlier Detection
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
Example 2: Anomaly Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
Example 3: Reconstruction Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
π‘ Tips & Best Practices
- Hidden Dimensions: Start with [16, 8], scale based on data complexity
- Weight Balance: Balance reconstruction and distribution weights
- Feature Normalization: Works best with normalized input features
- Anomaly Threshold: Set appropriate thresholds for anomaly detection
- Autoencoder Training: Ensure autoencoder is well-trained
- Distribution Learning: Monitor distribution learning progress
β οΈ Common Pitfalls
- Hidden Dimensions: Must be positive integers
- Weight Sum: Reconstruction and distribution weights should sum to 1.0
- Memory Usage: Scales with hidden dimensions
- Overfitting: Monitor for overfitting with complex autoencoders
- Anomaly Threshold: May need tuning for different datasets
π Related Layers
- CategoricalAnomalyDetectionLayer - Categorical anomaly detection
- BusinessRulesLayer - Business rules validation
- FeatureCutout - Feature regularization
- DistributionAwareEncoder - Distribution-aware encoding
π Further Reading
- Anomaly Detection - Anomaly detection concepts
- Autoencoders - Autoencoder concepts
- Outlier Detection - Outlier detection techniques
- KerasFactory Layer Explorer - Browse all available layers
- Feature Engineering Tutorial - Complete guide to feature engineering