🔧 Utils API Reference
Welcome to the KMR Utilities documentation! This page provides documentation for KMR utility functions and tools, including the powerful Data Analyzer that can recommend appropriate layers for your tabular data.
What You'll Find Here
Each utility includes detailed documentation with: - ✨ Complete parameter descriptions with types and defaults - 🎯 Usage examples showing real-world applications - ⚡ Best practices and performance considerations - 🎨 When to use guidance for each utility - 🔧 Implementation notes for developers
Smart Data Analysis
The Data Analyzer can automatically analyze your CSV files and recommend the best KMR layers for your specific dataset.
CLI Integration
Use the command-line interface for quick data analysis and layer recommendations.
🔍 Data Analyzer
🧠 DataAnalyzer
Intelligent data analyzer that examines CSV files and recommends appropriate KMR layers based on data characteristics.
kmr.utils.data_analyzer.DataAnalyzer
DataAnalyzer()
Analyzes tabular data and recommends appropriate KMR layers.
This class provides methods to analyze CSV files, extract statistics, and recommend layers from the Keras Model Registry based on data characteristics.
Attributes:
Name | Type | Description |
---|---|---|
registrations |
dict[str, list[tuple[str, str, str]]]
|
Dictionary mapping data characteristics to recommended layer classes. |
Initialize the data analyzer with layer registrations.
Functions
register_recommendation
register_recommendation(
characteristic, layer_name, description, use_case
)
Register a new layer recommendation for a specific data characteristic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
characteristic |
str
|
The data characteristic identifier (e.g., 'continuous_features') |
required |
layer_name |
str
|
The name of the layer class |
required |
description |
str
|
Brief description of the layer |
required |
use_case |
str
|
When to use this layer |
required |
analyze_csv
analyze_csv(filepath)
Analyze a single CSV file and return statistics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath |
str
|
Path to the CSV file |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary containing dataset statistics and characteristics |
analyze_directory
analyze_directory(directory_path, pattern='*.csv')
Analyze all CSV files in a directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory_path |
str
|
Path to the directory containing CSV files |
required |
pattern |
str
|
Glob pattern to match files (default: "*.csv") |
'*.csv'
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping filenames to their analysis results |
recommend_layers
recommend_layers(stats)
Recommend layers based on data statistics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stats |
dict[str, Any]
|
Dictionary of dataset statistics from analyze_csv |
required |
Returns:
Type | Description |
---|---|
dict[str, list[tuple[str, str, str]]]
|
Dictionary mapping characteristics to recommended layers |
analyze_and_recommend
analyze_and_recommend(source, pattern='*.csv')
Analyze data and provide layer recommendations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
str
|
Path to file or directory to analyze |
required |
pattern |
str
|
File pattern if source is a directory |
'*.csv'
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary with analysis results and recommendations |
📋 Usage Examples
Basic Data Analysis
from kmr.utils.data_analyzer import DataAnalyzer
# Initialize the analyzer
analyzer = DataAnalyzer()
# Analyze a CSV file
results = analyzer.analyze_file("data/tabular_data.csv")
# Get layer recommendations
recommendations = results.get_layer_recommendations()
print("Recommended layers:", recommendations)
# Get data insights
insights = results.get_data_insights()
print("Data insights:", insights)
Advanced Analysis with Custom Parameters
from kmr.utils.data_analyzer import DataAnalyzer
# Initialize with custom parameters
analyzer = DataAnalyzer(
sample_size=1000, # Analyze first 1000 rows
correlation_threshold=0.7, # High correlation threshold
categorical_threshold=0.1 # 10% unique values = categorical
)
# Analyze with detailed output
results = analyzer.analyze_file(
"data/large_dataset.csv",
output_format="detailed",
include_statistics=True
)
# Get specific recommendations
attention_layers = results.get_recommendations_by_type("attention")
feature_engineering = results.get_recommendations_by_type("feature_engineering")
print("Attention layers:", attention_layers)
print("Feature engineering:", feature_engineering)
Batch Analysis of Multiple Files
from kmr.utils.data_analyzer import DataAnalyzer
import os
analyzer = DataAnalyzer()
# Analyze multiple CSV files
data_dir = "data/"
csv_files = [f for f in os.listdir(data_dir) if f.endswith('.csv')]
all_results = {}
for file in csv_files:
file_path = os.path.join(data_dir, file)
results = analyzer.analyze_file(file_path)
all_results[file] = results.get_layer_recommendations()
# Compare recommendations across datasets
for file, recommendations in all_results.items():
print(f"{file}: {recommendations}")
💻 DataAnalyzerCLI
Command-line interface for the data analyzer, allowing easy analysis of datasets from the terminal.
kmr.utils.data_analyzer_cli
Command-line interface for the Keras Model Registry Data Analyzer.
This script provides a convenient way to analyze CSV data and get layer recommendations from the command line.
Classes
Functions
parse_args
parse_args()
Parse command line arguments.
Returns:
Type | Description |
---|---|
Namespace
|
Parsed arguments namespace |
setup_logging
setup_logging(verbose)
Configure logging based on verbosity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
verbose |
bool
|
Whether to enable verbose logging |
required |
format_result
format_result(result, recommendations_only)
Format the result based on user preferences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
result |
dict[str, Any]
|
The analysis result |
required |
recommendations_only |
bool
|
Whether to include only recommendations |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Formatted result dictionary |
main
main()
Main entry point for the script.
🖥️ CLI Usage Examples
Basic CLI Analysis
# Analyze a single CSV file
kmr-analyze data/tabular_data.csv
# Analyze with verbose output
kmr-analyze data/tabular_data.csv --verbose
# Save results to file
kmr-analyze data/tabular_data.csv --output results.json
Advanced CLI Options
# Analyze with custom parameters
kmr-analyze data/large_dataset.csv \
--sample-size 5000 \
--correlation-threshold 0.8 \
--output detailed_analysis.json \
--format json
# Analyze multiple files
kmr-analyze data/*.csv --batch --output batch_results.json
# Get specific layer recommendations
kmr-analyze data/tabular_data.csv --layers attention,embedding
Integration with Jupyter Notebooks
# In a Jupyter notebook, you can use the CLI output
import json
import subprocess
# Run CLI analysis
result = subprocess.run([
'kmr-analyze', 'data/tabular_data.csv',
'--output', 'analysis.json', '--format', 'json'
], capture_output=True, text=True)
# Load results
with open('analysis.json', 'r') as f:
analysis = json.load(f)
# Use results in your notebook
print("Recommended layers:", analysis['recommendations'])
print("Data statistics:", analysis['statistics'])
🔄 Complete Workflow Example
End-to-End Data Analysis to Model Building
from kmr.utils.data_analyzer import DataAnalyzer
from kmr.layers import TabularAttention, AdvancedNumericalEmbedding
from kmr.models import BaseFeedForwardModel
import keras
# Step 1: Analyze your data
analyzer = DataAnalyzer()
analysis = analyzer.analyze_file("data/my_dataset.csv")
# Step 2: Get recommendations
recommendations = analysis.get_layer_recommendations()
print("Recommended layers:", recommendations)
# Step 3: Build model based on recommendations
if "TabularAttention" in recommendations:
# Use tabular attention for feature relationships
attention_layer = TabularAttention(
num_heads=8,
d_model=64,
dropout_rate=0.1
)
if "AdvancedNumericalEmbedding" in recommendations:
# Use advanced embedding for numerical features
embedding_layer = AdvancedNumericalEmbedding(
embedding_dim=32,
mlp_hidden_units=64,
num_bins=20
)
# Step 4: Create your model architecture
inputs = keras.Input(shape=(100, 20)) # Based on your data shape
# Apply recommended layers
if 'embedding_layer' in locals():
x = embedding_layer(inputs)
else:
x = inputs
if 'attention_layer' in locals():
x = attention_layer(x)
# Add final layers
x = keras.layers.Dense(64, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
# Create and compile model
model = keras.Model(inputs, outputs)
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
print("Model built with recommended KMR layers!")
model.summary()
Automated Model Architecture Selection
from kmr.utils.data_analyzer import DataAnalyzer
from kmr.layers import *
from kmr.models import BaseFeedForwardModel
import keras
def build_recommended_model(csv_file):
"""Automatically build a model based on data analysis."""
# Analyze data
analyzer = DataAnalyzer()
analysis = analyzer.analyze_file(csv_file)
recommendations = analysis.get_layer_recommendations()
# Get data shape from analysis
data_shape = analysis.get_data_shape()
num_features = data_shape[1]
# Build model based on recommendations
inputs = keras.Input(shape=(num_features,))
# Apply recommended layers
x = inputs
for layer_name in recommendations:
if layer_name == "TabularAttention":
x = TabularAttention(num_heads=4, d_model=32)(x)
elif layer_name == "AdvancedNumericalEmbedding":
x = AdvancedNumericalEmbedding(embedding_dim=16)(x)
elif layer_name == "VariableSelection":
x = VariableSelection(nr_features=num_features, units=32)(x)
# Add more layer mappings as needed
# Add final layers
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)
return model, analysis
# Use the function
model, analysis = build_recommended_model("data/my_dataset.csv")
print("Automatically built model with layers:", analysis.get_layer_recommendations())
🎨 Decorators
✨ Decorators
Utility decorators for common functionality in KMR components and enhanced development experience.
kmr.utils.decorators
Functions
log_init
log_init(cls)
Class decorator to log initialization arguments.
log_method
log_method(func)
Method decorator to log method calls with their arguments.
log_property
log_property(func)
Property decorator to log property access.
add_serialization
add_serialization(cls)
Decorator to add serialization methods to a Keras model class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls |
T
|
The class to decorate. |
required |
Returns:
Type | Description |
---|---|
T
|
The decorated class. |
🔧 Usage Examples
Layer Validation Decorator
from kmr.utils.decorators import validate_inputs
@validate_inputs
def custom_layer_call(self, inputs, training=None):
"""Custom layer with automatic input validation."""
# Your layer logic here
return processed_outputs
Performance Monitoring Decorator
from kmr.utils.decorators import monitor_performance
@monitor_performance
def expensive_computation(self, data):
"""Function with automatic performance monitoring."""
# Your computation here
return result
Serialization Helper Decorator
from kmr.utils.decorators import serializable
@serializable
class CustomLayer:
"""Layer with automatic serialization support."""
def __init__(self, param1, param2):
self.param1 = param1
self.param2 = param2