Assets API 📦
First-class citizens for data and models with automatic metadata extraction.
Quick Start
Dataset - Auto-Extracted Statistics
from flowyml import Dataset
import pandas as pd
df = pd.DataFrame(...)
# Auto-extracts: samples, features, columns, column_stats
ds = Dataset.create(data=df, name="my_data")
# Access auto-extracted properties
print(ds.num_samples) # Number of rows
print(ds.num_features) # Number of columns
print(ds.feature_columns) # Column names
print(ds.column_stats) # Per-column statistics
# Convenience methods
ds = Dataset.from_csv("data.csv", name="my_data")
ds = Dataset.from_parquet("data.parquet", name="my_data")
Model - Auto-Extracted Metadata
from flowyml import Model
# Auto-extracts: framework, parameters, layers, optimizer, etc.
model = Model.create(data=keras_model, name="my_model")
# Access auto-extracted properties
print(model.framework) # 'keras', 'pytorch', 'sklearn'
print(model.parameters) # Total parameter count
print(model.num_layers) # Number of layers
print(model.optimizer) # Optimizer name (Keras)
print(model.hyperparameters) # Hyperparameters (sklearn)
# Convenience methods
model = Model.from_keras(keras_model, name="my_model", callback=flowyml_callback)
model = Model.from_pytorch(pytorch_model, name="my_model")
model = Model.from_sklearn(sklearn_model, name="my_model")
Supported Frameworks (Model)
| Framework | Detection | Auto-Extraction Level |
|---|---|---|
| Keras/TensorFlow | ✅ | Full (layers, optimizer, loss, metrics) |
| PyTorch | ✅ | Full (layers, device, dtype, params) |
| Scikit-learn | ✅ | Full (hyperparams, feature importance) |
| XGBoost | ✅ | Full (trees, hyperparams) |
| LightGBM | ✅ | Full (trees, hyperparams) |
| CatBoost | ✅ | Good |
| Hugging Face | ✅ | Good (config, hidden_size) |
| Custom | ✅ | Basic (class name, has_fit/predict) |
Supported Data Types (Dataset)
| Type | Auto-Extraction |
|---|---|
| Pandas DataFrame | Full (columns, stats, dtypes) |
| NumPy array | Full (shape, dtype, stats) |
| Python dict | Full (keys as columns, stats) |
| TensorFlow Dataset | Good (element_spec, cardinality) |
| List of dicts | Full (columns from keys, stats) |
Class Asset
Base class for all ML assets (datasets, models, features, etc).
Assets are first-class objects in flowyml pipelines with full lineage tracking.
Source code in flowyml/assets/base.py
Attributes
properties: dict[str, Any]
property
Expose mutable properties stored in metadata.
tags: dict[str, str]
property
Expose mutable tags stored in metadata.
Functions
add_property(key: str, value: Any) -> None
add_tag(key: str, value: str) -> None
create(data: Any, name: str | None = None, version: str | None = None, parent: Optional[Asset] = None, **kwargs: Any) -> Asset
classmethod
Factory method to create an asset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The actual data/object |
required |
name
|
str | None
|
Asset name |
None
|
version
|
str | None
|
Asset version |
None
|
parent
|
Optional[Asset]
|
Parent asset for lineage |
None
|
**kwargs
|
Any
|
Additional metadata |
{}
|
Returns:
| Type | Description |
|---|---|
Asset
|
New asset instance |
Source code in flowyml/assets/base.py
get_all_ancestors() -> set[Asset]
Get all ancestor assets.
Source code in flowyml/assets/base.py
get_all_descendants() -> set[Asset]
Get all descendant assets.
Source code in flowyml/assets/base.py
get_hash() -> str
Generate hash of asset for caching/versioning.
Source code in flowyml/assets/base.py
get_lineage(depth: int = -1) -> dict[str, Any]
Get asset lineage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depth
|
int
|
How many levels to traverse (-1 for all) |
-1
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Lineage tree as nested dict |
Source code in flowyml/assets/base.py
to_dict() -> dict[str, Any]
Convert asset to dictionary.
Source code in flowyml/assets/base.py
Class Dataset
Bases: Asset
Dataset asset with automatic schema detection and statistics extraction.
The Dataset class automatically extracts statistics and metadata from various data formats, reducing boilerplate code and improving UX.
Supported formats
- pandas DataFrame: Auto-extracts columns, dtypes, statistics
- numpy array: Auto-extracts shape, dtype, statistics
- dict: Auto-extracts features/target structure, column stats
- TensorFlow Dataset: Auto-extracts element_spec, cardinality
- List of dicts: Converts to dict format and extracts stats
Example
Minimal usage - stats are extracted automatically!
import pandas as pd df = pd.read_csv("data.csv") dataset = Dataset.create(data=df, name="my_dataset") print(dataset.num_samples) # Auto-extracted print(dataset.feature_columns) # Auto-detected
With dict format
data = {"features": {"x": [1, 2, 3], "y": [4, 5, 6]}, "target": [0, 1, 0]} dataset = Dataset.create(data=data, name="my_dataset")
All stats computed automatically!
Initialize Dataset with automatic statistics extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Dataset name |
required |
version
|
str | None
|
Version string |
None
|
data
|
Any
|
The actual data (DataFrame, array, dict, etc.) |
None
|
schema
|
Any | None
|
Optional schema definition |
None
|
location
|
str | None
|
Storage location/path |
None
|
parent
|
Asset | None
|
Parent asset for lineage |
None
|
tags
|
dict[str, str] | None
|
Metadata tags |
None
|
properties
|
dict[str, Any] | None
|
Additional properties (merged with auto-extracted) |
None
|
auto_extract_stats
|
bool
|
Whether to automatically extract statistics |
True
|
Source code in flowyml/assets/dataset.py
Attributes
column_stats: dict[str, dict] | None
property
Get per-column statistics (auto-extracted).
columns: list[str] | None
property
Get all column names (auto-extracted or user-provided).
feature_columns: list[str] | None
property
Get list of feature column names (auto-extracted or user-provided).
framework: str | None
property
Get the data framework/format (auto-detected).
label_column: str | None
property
Get the label/target column name (auto-detected or user-provided).
num_features: int | None
property
Get number of features (auto-extracted or user-provided).
num_samples: int | None
property
Get number of samples (auto-extracted or user-provided).
size: int | None
property
Get dataset size if available.
Functions
__repr__() -> str
String representation with key stats.
Source code in flowyml/assets/dataset.py
create(data: Any, name: str, version: str | None = None, schema: Any | None = None, location: str | None = None, parent: Asset | None = None, tags: dict[str, str] | None = None, properties: dict[str, Any] | None = None, auto_extract_stats: bool = True, **kwargs: Any) -> Dataset
classmethod
Create a Dataset with automatic statistics extraction.
This is the preferred way to create Dataset objects. Statistics are automatically extracted from the data, reducing boilerplate code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The actual data (DataFrame, array, dict, etc.) |
required |
name
|
str
|
Dataset name |
required |
version
|
str | None
|
Version string (optional) |
None
|
schema
|
Any | None
|
Optional schema definition |
None
|
location
|
str | None
|
Storage location/path |
None
|
parent
|
Asset | None
|
Parent asset for lineage |
None
|
tags
|
dict[str, str] | None
|
Metadata tags |
None
|
properties
|
dict[str, Any] | None
|
Additional properties (merged with auto-extracted) |
None
|
auto_extract_stats
|
bool
|
Whether to automatically extract statistics |
True
|
**kwargs
|
Any
|
Additional properties to store |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset instance with auto-extracted statistics |
Example
df = pd.read_csv("data.csv") dataset = Dataset.create(data=df, name="my_data", source="data.csv")
Stats are automatically extracted!
Source code in flowyml/assets/dataset.py
from_csv(path: str, name: str | None = None, **kwargs: Any) -> Dataset
classmethod
Load a Dataset from a CSV file with automatic statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to CSV file |
required |
name
|
str | None
|
Dataset name (defaults to filename) |
None
|
**kwargs
|
Any
|
Additional properties |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with auto-extracted statistics |
Source code in flowyml/assets/dataset.py
from_parquet(path: str, name: str | None = None, **kwargs: Any) -> Dataset
classmethod
Load a Dataset from a Parquet file with automatic statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to Parquet file |
required |
name
|
str | None
|
Dataset name (defaults to filename) |
None
|
**kwargs
|
Any
|
Additional properties |
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
Dataset with auto-extracted statistics |
Source code in flowyml/assets/dataset.py
get_column_stat(column: str, stat: str) -> Any
Get a specific statistic for a column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column
|
str
|
Column name |
required |
stat
|
str
|
Statistic name (mean, std, min, max, median, count, unique) |
required |
Returns:
| Type | Description |
|---|---|
Any
|
The statistic value or None |
Source code in flowyml/assets/dataset.py
split(train_ratio: float = 0.8, name_prefix: str | None = None, random_state: int | None = 42) -> tuple[Dataset, Dataset]
Split dataset into train/test with auto-extracted statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_ratio
|
float
|
Ratio for training split |
0.8
|
name_prefix
|
str | None
|
Prefix for split dataset names |
None
|
random_state
|
int | None
|
Random seed for reproducibility |
42
|
Returns:
| Type | Description |
|---|---|
tuple[Dataset, Dataset]
|
Tuple of (train_dataset, test_dataset) |
Source code in flowyml/assets/dataset.py
validate_schema() -> bool
Class Model
Bases: Asset
Model asset with automatic metadata extraction and training history.
The Model class automatically extracts metadata from various ML frameworks, reducing boilerplate code and improving UX. It also captures training history for visualization in the FlowyML dashboard.
Supported frameworks
- Keras/TensorFlow: Auto-extracts layers, parameters, optimizer, loss
- PyTorch: Auto-extracts modules, parameters, training mode
- Scikit-learn: Auto-extracts hyperparameters, feature importance
- XGBoost/LightGBM: Auto-extracts trees, hyperparameters
Example
Minimal usage - properties auto-extracted!
model_asset = Model.create( ... data=trained_keras_model, ... name="my_model", ... ) print(model_asset.parameters) # Auto-extracted print(model_asset.framework) # Auto-detected
With FlowyML callback - training history auto-captured
callback = FlowymlKerasCallback(experiment_name="demo") model.fit(X, y, callbacks=[callback]) model_asset = Model.create( ... data=model, ... name="trained_model", ... flowyml_callback=callback, # Auto-extracts training history! ... )
Initialize Model with automatic metadata extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Model name |
required |
version
|
str | None
|
Version string |
None
|
data
|
Any
|
The model object (Keras, PyTorch, sklearn, etc.) |
None
|
architecture
|
str | None
|
Architecture name (auto-detected if not provided) |
None
|
framework
|
str | None
|
Framework name (auto-detected if not provided) |
None
|
input_shape
|
tuple | None
|
Input shape (auto-detected for Keras) |
None
|
output_shape
|
tuple | None
|
Output shape (auto-detected for Keras) |
None
|
trained_on
|
Asset | None
|
Dataset this model was trained on |
None
|
parent
|
Asset | None
|
Parent asset for lineage |
None
|
tags
|
dict[str, str] | None
|
Metadata tags |
None
|
properties
|
dict[str, Any] | None
|
Additional properties (merged with auto-extracted) |
None
|
training_history
|
dict[str, list] | None
|
Training metrics per epoch |
None
|
auto_extract
|
bool
|
Whether to auto-extract model metadata |
True
|
Source code in flowyml/assets/model.py
756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 | |
Attributes
hyperparameters: dict | None
property
Get hyperparameters (auto-extracted from sklearn/xgboost).
layer_types: list[str] | None
property
Get list of layer types (auto-extracted).
learning_rate: float | None
property
Get learning rate (auto-extracted from Keras).
loss_function: str | None
property
Get loss function (auto-extracted from Keras).
metrics: list[str] | None
property
Get metrics (auto-extracted from Keras).
num_layers: int | None
property
Get number of layers (auto-extracted).
optimizer: str | None
property
Get optimizer name (auto-extracted from Keras).
parameters: int | None
property
Get number of model parameters (auto-extracted).
trainable_parameters: int | None
property
Get number of trainable parameters (auto-extracted).
Functions
__repr__() -> str
String representation with key info.
Source code in flowyml/assets/model.py
create(data: Any, name: str | None = None, version: str | None = None, parent: Asset | None = None, flowyml_callback: Any = None, keras_history: Any = None, auto_extract: bool = True, **kwargs: Any) -> Model
classmethod
Create a Model asset with automatic metadata extraction.
This is the preferred way to create Model objects. Metadata is automatically extracted from the model, and training history can be captured from FlowyML callbacks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The model object (Keras, PyTorch, sklearn, etc.) |
required |
name
|
str | None
|
Asset name (auto-generated if not provided) |
None
|
version
|
str | None
|
Asset version |
None
|
parent
|
Asset | None
|
Parent asset for lineage |
None
|
flowyml_callback
|
Any
|
FlowymlKerasCallback for auto-capturing training history |
None
|
keras_history
|
Any
|
Keras History object from model.fit() |
None
|
auto_extract
|
bool
|
Whether to auto-extract model metadata |
True
|
**kwargs
|
Any
|
Additional parameters including: - training_history: Dict of training metrics per epoch - architecture: Model architecture name - framework: ML framework (keras, pytorch, etc.) - properties: Additional properties - tags: Metadata tags |
{}
|
Returns:
| Type | Description |
|---|---|
Model
|
New Model instance with auto-extracted metadata |
Example
Simple usage - everything auto-extracted
model_asset = Model.create(data=model, name="my_model")
With FlowyML callback
callback = FlowymlKerasCallback(experiment_name="demo") model.fit(X, y, callbacks=[callback]) model_asset = Model.create( ... data=model, ... name="trained_model", ... flowyml_callback=callback, ... )
With Keras History
history = model.fit(X, y) model_asset = Model.create( ... data=model, ... name="trained_model", ... keras_history=history, ... )
Source code in flowyml/assets/model.py
838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 | |
from_keras(model: Any, name: str | None = None, callback: Any = None, history: Any = None, **kwargs: Any) -> Model
classmethod
Create a Model asset from a Keras model with full auto-extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Any
|
Keras model object |
required |
name
|
str | None
|
Asset name |
None
|
callback
|
Any
|
FlowymlKerasCallback for training history |
None
|
history
|
Any
|
Keras History object from model.fit() |
None
|
**kwargs
|
Any
|
Additional properties |
{}
|
Returns:
| Type | Description |
|---|---|
Model
|
Model asset with auto-extracted Keras metadata |
Source code in flowyml/assets/model.py
from_pytorch(model: Any, name: str | None = None, training_history: dict | None = None, **kwargs: Any) -> Model
classmethod
Create a Model asset from a PyTorch model with full auto-extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Any
|
PyTorch model object (nn.Module) |
required |
name
|
str | None
|
Asset name |
None
|
training_history
|
dict | None
|
Training metrics dict |
None
|
**kwargs
|
Any
|
Additional properties |
{}
|
Returns:
| Type | Description |
|---|---|
Model
|
Model asset with auto-extracted PyTorch metadata |
Source code in flowyml/assets/model.py
from_sklearn(model: Any, name: str | None = None, **kwargs: Any) -> Model
classmethod
Create a Model asset from a scikit-learn model with full auto-extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Any
|
Scikit-learn model object |
required |
name
|
str | None
|
Asset name |
None
|
**kwargs
|
Any
|
Additional properties |
{}
|
Returns:
| Type | Description |
|---|---|
Model
|
Model asset with auto-extracted sklearn metadata |
Source code in flowyml/assets/model.py
get_architecture_info() -> dict[str, Any]
Get architecture information.
Source code in flowyml/assets/model.py
get_parameters_count() -> int | None
get_training_datasets()
get_training_info() -> dict[str, Any]
Get training information.