Skip to content

🔧 Utils API Reference

Welcome to the KMR Utilities documentation! This page provides documentation for KMR utility functions and tools, including the powerful Data Analyzer that can recommend appropriate layers for your tabular data.

What You'll Find Here

Each utility includes detailed documentation with: - ✨ Complete parameter descriptions with types and defaults - 🎯 Usage examples showing real-world applications - ⚡ Best practices and performance considerations - 🎨 When to use guidance for each utility - 🔧 Implementation notes for developers

Smart Data Analysis

The Data Analyzer can automatically analyze your CSV files and recommend the best KMR layers for your specific dataset.

CLI Integration

Use the command-line interface for quick data analysis and layer recommendations.

🔍 Data Analyzer

🧠 DataAnalyzer

Intelligent data analyzer that examines CSV files and recommends appropriate KMR layers based on data characteristics.

kmr.utils.data_analyzer.DataAnalyzer

DataAnalyzer()

Analyzes tabular data and recommends appropriate KMR layers.

This class provides methods to analyze CSV files, extract statistics, and recommend layers from the Keras Model Registry based on data characteristics.

Attributes:

Name Type Description
registrations dict[str, list[tuple[str, str, str]]]

Dictionary mapping data characteristics to recommended layer classes.

Initialize the data analyzer with layer registrations.

Functions

register_recommendation

register_recommendation(
    characteristic, layer_name, description, use_case
)

Register a new layer recommendation for a specific data characteristic.

Parameters:

Name Type Description Default
characteristic str

The data characteristic identifier (e.g., 'continuous_features')

required
layer_name str

The name of the layer class

required
description str

Brief description of the layer

required
use_case str

When to use this layer

required

analyze_csv

analyze_csv(filepath)

Analyze a single CSV file and return statistics.

Parameters:

Name Type Description Default
filepath str

Path to the CSV file

required

Returns:

Type Description
dict[str, Any]

Dictionary containing dataset statistics and characteristics

analyze_directory

analyze_directory(directory_path, pattern='*.csv')

Analyze all CSV files in a directory.

Parameters:

Name Type Description Default
directory_path str

Path to the directory containing CSV files

required
pattern str

Glob pattern to match files (default: "*.csv")

'*.csv'

Returns:

Type Description
dict[str, dict[str, Any]]

Dictionary mapping filenames to their analysis results

recommend_layers

recommend_layers(stats)

Recommend layers based on data statistics.

Parameters:

Name Type Description Default
stats dict[str, Any]

Dictionary of dataset statistics from analyze_csv

required

Returns:

Type Description
dict[str, list[tuple[str, str, str]]]

Dictionary mapping characteristics to recommended layers

analyze_and_recommend

analyze_and_recommend(source, pattern='*.csv')

Analyze data and provide layer recommendations.

Parameters:

Name Type Description Default
source str

Path to file or directory to analyze

required
pattern str

File pattern if source is a directory

'*.csv'

Returns:

Type Description
dict[str, Any]

Dictionary with analysis results and recommendations

📋 Usage Examples

Basic Data Analysis

from kmr.utils.data_analyzer import DataAnalyzer

# Initialize the analyzer
analyzer = DataAnalyzer()

# Analyze a CSV file
results = analyzer.analyze_file("data/tabular_data.csv")

# Get layer recommendations
recommendations = results.get_layer_recommendations()
print("Recommended layers:", recommendations)

# Get data insights
insights = results.get_data_insights()
print("Data insights:", insights)

Advanced Analysis with Custom Parameters

from kmr.utils.data_analyzer import DataAnalyzer

# Initialize with custom parameters
analyzer = DataAnalyzer(
    sample_size=1000,  # Analyze first 1000 rows
    correlation_threshold=0.7,  # High correlation threshold
    categorical_threshold=0.1   # 10% unique values = categorical
)

# Analyze with detailed output
results = analyzer.analyze_file(
    "data/large_dataset.csv",
    output_format="detailed",
    include_statistics=True
)

# Get specific recommendations
attention_layers = results.get_recommendations_by_type("attention")
feature_engineering = results.get_recommendations_by_type("feature_engineering")

print("Attention layers:", attention_layers)
print("Feature engineering:", feature_engineering)

Batch Analysis of Multiple Files

from kmr.utils.data_analyzer import DataAnalyzer
import os

analyzer = DataAnalyzer()

# Analyze multiple CSV files
data_dir = "data/"
csv_files = [f for f in os.listdir(data_dir) if f.endswith('.csv')]

all_results = {}
for file in csv_files:
    file_path = os.path.join(data_dir, file)
    results = analyzer.analyze_file(file_path)
    all_results[file] = results.get_layer_recommendations()

# Compare recommendations across datasets
for file, recommendations in all_results.items():
    print(f"{file}: {recommendations}")

💻 DataAnalyzerCLI

Command-line interface for the data analyzer, allowing easy analysis of datasets from the terminal.

kmr.utils.data_analyzer_cli

Command-line interface for the Keras Model Registry Data Analyzer.

This script provides a convenient way to analyze CSV data and get layer recommendations from the command line.

Classes

Functions

parse_args

parse_args()

Parse command line arguments.

Returns:

Type Description
Namespace

Parsed arguments namespace

setup_logging

setup_logging(verbose)

Configure logging based on verbosity.

Parameters:

Name Type Description Default
verbose bool

Whether to enable verbose logging

required

format_result

format_result(result, recommendations_only)

Format the result based on user preferences.

Parameters:

Name Type Description Default
result dict[str, Any]

The analysis result

required
recommendations_only bool

Whether to include only recommendations

required

Returns:

Type Description
dict[str, Any]

Formatted result dictionary

main

main()

Main entry point for the script.

🖥️ CLI Usage Examples

Basic CLI Analysis

# Analyze a single CSV file
kmr-analyze data/tabular_data.csv

# Analyze with verbose output
kmr-analyze data/tabular_data.csv --verbose

# Save results to file
kmr-analyze data/tabular_data.csv --output results.json

Advanced CLI Options

# Analyze with custom parameters
kmr-analyze data/large_dataset.csv \
    --sample-size 5000 \
    --correlation-threshold 0.8 \
    --output detailed_analysis.json \
    --format json

# Analyze multiple files
kmr-analyze data/*.csv --batch --output batch_results.json

# Get specific layer recommendations
kmr-analyze data/tabular_data.csv --layers attention,embedding

Integration with Jupyter Notebooks

# In a Jupyter notebook, you can use the CLI output
import json
import subprocess

# Run CLI analysis
result = subprocess.run([
    'kmr-analyze', 'data/tabular_data.csv', 
    '--output', 'analysis.json', '--format', 'json'
], capture_output=True, text=True)

# Load results
with open('analysis.json', 'r') as f:
    analysis = json.load(f)

# Use results in your notebook
print("Recommended layers:", analysis['recommendations'])
print("Data statistics:", analysis['statistics'])

🔄 Complete Workflow Example

End-to-End Data Analysis to Model Building

from kmr.utils.data_analyzer import DataAnalyzer
from kmr.layers import TabularAttention, AdvancedNumericalEmbedding
from kmr.models import BaseFeedForwardModel
import keras

# Step 1: Analyze your data
analyzer = DataAnalyzer()
analysis = analyzer.analyze_file("data/my_dataset.csv")

# Step 2: Get recommendations
recommendations = analysis.get_layer_recommendations()
print("Recommended layers:", recommendations)

# Step 3: Build model based on recommendations
if "TabularAttention" in recommendations:
    # Use tabular attention for feature relationships
    attention_layer = TabularAttention(
        num_heads=8,
        d_model=64,
        dropout_rate=0.1
    )

if "AdvancedNumericalEmbedding" in recommendations:
    # Use advanced embedding for numerical features
    embedding_layer = AdvancedNumericalEmbedding(
        embedding_dim=32,
        mlp_hidden_units=64,
        num_bins=20
    )

# Step 4: Create your model architecture
inputs = keras.Input(shape=(100, 20))  # Based on your data shape

# Apply recommended layers
if 'embedding_layer' in locals():
    x = embedding_layer(inputs)
else:
    x = inputs

if 'attention_layer' in locals():
    x = attention_layer(x)

# Add final layers
x = keras.layers.Dense(64, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

# Create and compile model
model = keras.Model(inputs, outputs)
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("Model built with recommended KMR layers!")
model.summary()

Automated Model Architecture Selection

from kmr.utils.data_analyzer import DataAnalyzer
from kmr.layers import *
from kmr.models import BaseFeedForwardModel
import keras

def build_recommended_model(csv_file):
    """Automatically build a model based on data analysis."""

    # Analyze data
    analyzer = DataAnalyzer()
    analysis = analyzer.analyze_file(csv_file)
    recommendations = analysis.get_layer_recommendations()

    # Get data shape from analysis
    data_shape = analysis.get_data_shape()
    num_features = data_shape[1]

    # Build model based on recommendations
    inputs = keras.Input(shape=(num_features,))

    # Apply recommended layers
    x = inputs
    for layer_name in recommendations:
        if layer_name == "TabularAttention":
            x = TabularAttention(num_heads=4, d_model=32)(x)
        elif layer_name == "AdvancedNumericalEmbedding":
            x = AdvancedNumericalEmbedding(embedding_dim=16)(x)
        elif layer_name == "VariableSelection":
            x = VariableSelection(nr_features=num_features, units=32)(x)
        # Add more layer mappings as needed

    # Add final layers
    x = keras.layers.Dense(32, activation='relu')(x)
    outputs = keras.layers.Dense(1, activation='sigmoid')(x)

    model = keras.Model(inputs, outputs)
    return model, analysis

# Use the function
model, analysis = build_recommended_model("data/my_dataset.csv")
print("Automatically built model with layers:", analysis.get_layer_recommendations())

🎨 Decorators

✨ Decorators

Utility decorators for common functionality in KMR components and enhanced development experience.

kmr.utils.decorators

Functions

log_init

log_init(cls)

Class decorator to log initialization arguments.

log_method

log_method(func)

Method decorator to log method calls with their arguments.

log_property

log_property(func)

Property decorator to log property access.

add_serialization

add_serialization(cls)

Decorator to add serialization methods to a Keras model class.

Parameters:

Name Type Description Default
cls T

The class to decorate.

required

Returns:

Type Description
T

The decorated class.

🔧 Usage Examples

Layer Validation Decorator

from kmr.utils.decorators import validate_inputs

@validate_inputs
def custom_layer_call(self, inputs, training=None):
    """Custom layer with automatic input validation."""
    # Your layer logic here
    return processed_outputs

Performance Monitoring Decorator

from kmr.utils.decorators import monitor_performance

@monitor_performance
def expensive_computation(self, data):
    """Function with automatic performance monitoring."""
    # Your computation here
    return result

Serialization Helper Decorator

from kmr.utils.decorators import serializable

@serializable
class CustomLayer:
    """Layer with automatic serialization support."""
    def __init__(self, param1, param2):
        self.param1 = param1
        self.param2 = param2