🔍 KerasFactory Data Analyzer
The KerasFactory Data Analyzer is an intelligent utility that analyzes your tabular data and automatically recommends the best KerasFactory layers for your specific dataset.
Smart Recommendations
Just provide your CSV file, and the analyzer will suggest the most appropriate layers based on your data characteristics!
✨ Features
- 📊 Automatic Analysis: Analyzes single CSV files or entire directories
- 🎯 Feature Detection: Identifies numerical, categorical, date, and text features
- 🔍 Data Insights: Detects high cardinality, missing values, correlations, and patterns
- 🧩 Layer Recommendations: Suggests the best KerasFactory layers for your data
- 🔧 Extensible: Add custom recommendation rules
- 💻 CLI & API: Command-line interface and Python API
- 📈 Performance Tips: Guidance on layer configuration and optimization
🚀 Installation
The Data Analyzer is included with the KerasFactory package.
1 2 3 4 5 6 7 | |
💻 Usage
🖥️ Command-line Interface
The Data Analyzer can be used from the command line:
1 2 3 4 5 6 7 8 9 10 11 | |
🐍 Python API
You can also use the Data Analyzer in your Python code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Data Characteristics
The analyzer identifies the following data characteristics:
continuous_features: Numerical featurescategorical_features: Categorical featuresdate_features: Date and time featurestext_features: Text featureshigh_cardinality_categorical: Categorical features with high cardinalityhigh_missing_value_features: Features with many missing valuesfeature_interaction: Highly correlated feature pairstime_series: Date features that may indicate time series datageneral_tabular: General tabular data characteristics
Layer Recommendations
For each data characteristic, the analyzer recommends appropriate KerasFactory layers along with descriptions and use cases.
Example
For continuous features, the following layers might be recommended:
AdvancedNumericalEmbedding: Embeds continuous features using both MLP and discretization approachesDifferentialPreprocessingLayer: Applies various normalizations and transformations to numerical features
Extending Layer Recommendations
You can extend the layer recommendations by registering new layers:
1 2 3 4 5 6 7 8 9 | |
Example Script
Check out the example script at examples/data_analyzer_example.py for a complete demonstration.
Output Format
The analyzer returns a dictionary with the following structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
Caveats
- The analyzer relies on heuristics to identify feature types, which may not always be accurate.
- Recommendations are based on general patterns and may need adjustment for specific use cases.
- Performance may degrade with very large CSV files due to memory constraints.