π¬ Scikit-learn Integration
Bring scikit-learn models into production pipelines with automatic property extraction and 16+ auto-detected attributes.
π Auto-Properties π 16+ Attributes π·οΈ Model Registry
Scikit-Learn Integration π§
Classic ML pipelines made robust and reproducible.
What you'll learn
How to version and deploy sklearn models with automatic metadata extraction. Turn notebook scripts into production pipelines.
Why Scikit-Learn + flowyml?
- Auto-Extracted Metadata: Hyperparameters, feature importance, coefficientsβall captured automatically.
- Pipeline Versioning: Version the entire preprocessing + model chain.
- Model Registry: Promote the best Random Forest to production.
- Easy Serving: Deploy sklearn models as APIs.
π― Model.from_sklearn() Convenience Method
The easiest way to create Model assets with full metadata extraction:
from sklearn.ensemble import RandomForestClassifier
from flowyml import Model
# Train your model
rf = RandomForestClassifier(n_estimators=100, max_depth=10)
rf.fit(X_train, y_train)
# π― Use convenience method for full extraction
model_asset = Model.from_sklearn(
rf,
name="random_forest_classifier",
)
# Access auto-extracted properties
print(f"Framework: {model_asset.framework}") # 'sklearn'
print(f"Class: {model_asset.metadata.properties.get('model_class')}") # 'RandomForestClassifier'
print(f"Is Fitted: {model_asset.metadata.properties.get('is_fitted')}") # True
# Hyperparameters auto-extracted!
print(f"Hyperparameters: {model_asset.hyperparameters}")
# {'n_estimators': 100, 'max_depth': 10, 'criterion': 'gini', ...}
# Feature importance (for tree-based models)
print(f"Has Feature Importance: {model_asset.metadata.properties.get('has_feature_importances')}")
print(f"Num Features: {model_asset.metadata.properties.get('num_features')}")
π§ Pipeline Pattern
Return a sklearn.pipeline.Pipeline object for automatic serialization:
from sklearn.pipeline import Pipeline as SkPipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from flowyml import step, Model
@step
def build_pipeline():
return SkPipeline([
('scaler', StandardScaler()),
('rf', RandomForestClassifier(n_estimators=100))
])
@step
def train(pipeline, X, y):
pipeline.fit(X, y)
# Return a Model asset with auto-extracted metadata
return Model.from_sklearn(
pipeline,
name="trained_pipeline",
)
π§ Auto-Extracted Properties
The following properties are automatically extracted from sklearn models:
| Property | Description |
|---|---|
framework |
Always 'sklearn' |
model_class |
Class name (RandomForestClassifier, etc.) |
architecture |
Same as model_class |
hyperparameters |
All model hyperparameters |
is_fitted |
Whether model has been fitted |
has_feature_importances |
True for tree-based models |
num_features |
Number of features (if fitted) |
n_features_in |
Number of input features |
n_estimators |
For ensemble models |
num_estimators_fitted |
Actual estimators fitted |
max_depth |
For tree-based models |
num_classes |
For classifiers |
classes |
Class labels (for classifiers) |
coef_shape |
Coefficient shape (for linear models) |
intercept |
Intercept (for linear models) |
π³ Supported Model Types
Full auto-extraction works for all sklearn estimators:
- Classifiers: RandomForest, GradientBoosting, SVM, LogisticRegression, etc.
- Regressors: RandomForest, Ridge, Lasso, ElasticNet, etc.
- Transformers: StandardScaler, PCA, etc.
- Ensembles: VotingClassifier, StackingClassifier, etc.
- Pipelines: sklearn.pipeline.Pipeline
π What's Next?
π₯ PyTorch Integration
Integrate deep learning models with automatic state dict handling and metadata.
π€ HuggingFace Integration
Use Transformers models and datasets with full tokenizer and model artifact support.
π Evaluations
Evaluate your sklearn models with built-in metrics and comparison tools.