PyTorch API Reference 📖
Complete API reference for MLPotion's PyTorch components.
Auto-Generated Documentation
This page is automatically populated with API documentation from the source code.
Extensibility
These components are built using protocol-based design, making MLPotion easy to extend. Want to add new data sources, training methods, or integrations? See Contributing Guide.
Data Loading
mlpotion.frameworks.pytorch.data.datasets
Classes
CSVDataset
dataclass
Bases: Dataset[tuple[torch.Tensor, torch.Tensor] | torch.Tensor]
PyTorch Dataset for CSV files with on-demand tensor conversion.
This class loads CSV data into memory (using Pandas) and provides a map-style PyTorch Dataset. It supports filtering columns, separating labels, and efficient on-demand tensor conversion to minimize memory usage.
Attributes:
| Name | Type | Description |
|---|---|---|
file_pattern |
str
|
Glob pattern matching the CSV files to load. |
column_names |
list[str] | None
|
Specific columns to load. If None, all columns are loaded. |
label_name |
str | None
|
Name of the column to use as the label. If None, no labels are returned. |
dtype |
torch.dtype
|
The data type for the features (default: |
Example
from mlpotion.frameworks.pytorch import CSVDataset
from torch.utils.data import DataLoader
# Create dataset
dataset = CSVDataset(
file_pattern="data/train_*.csv",
label_name="target_class",
column_names=["feature1", "feature2", "target_class"]
)
# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
# Iterate
for features, labels in dataloader:
print(features.shape, labels.shape)
Functions
__getitem__
__getitem__(
idx: int,
) -> tuple[torch.Tensor, torch.Tensor] | torch.Tensor
Get item at index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx |
int
|
Global row index. |
required |
Returns:
| Type | Description |
|---|---|
tuple[torch.Tensor, torch.Tensor] | torch.Tensor
|
(features, label) tuple if labels exist, else just features. |
Source code in mlpotion/frameworks/pytorch/data/datasets.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
__len__
__len__() -> int
Return dataset length.
Source code in mlpotion/frameworks/pytorch/data/datasets.py
164 165 166 167 168 | |
__post_init__
__post_init__() -> None
Eagerly load CSV files into a DataFrame and validate configuration.
Source code in mlpotion/frameworks/pytorch/data/datasets.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
StreamingCSVDataset
dataclass
Bases: IterableDataset[tuple[torch.Tensor, torch.Tensor] | torch.Tensor]
Streaming PyTorch IterableDataset for large CSV files.
This dataset is designed for datasets that are too large to fit in memory. It reads CSV files
in chunks (using Pandas) and streams samples one by one. It is compatible with PyTorch's
IterableDataset interface.
Attributes:
| Name | Type | Description |
|---|---|---|
file_pattern |
str
|
Glob pattern matching the CSV files to load. |
column_names |
list[str] | None
|
Specific columns to load. |
label_name |
str | None
|
Name of the label column. |
chunksize |
int
|
Number of rows to read into memory at a time per file. |
dtype |
torch.dtype
|
The data type for the features. |
Example
from mlpotion.frameworks.pytorch import StreamingCSVDataset
from torch.utils.data import DataLoader
# Create streaming dataset
dataset = StreamingCSVDataset(
file_pattern="data/large_dataset_*.csv",
label_name="target",
chunksize=10000
)
# Create DataLoader (shuffle must be False for IterableDataset)
dataloader = DataLoader(dataset, batch_size=64)
for features, labels in dataloader:
# Train model...
pass
Functions
__iter__
__iter__() -> (
Iterator[
tuple[torch.Tensor, torch.Tensor] | torch.Tensor
]
)
Yield samples one by one across all CSV files.
Source code in mlpotion/frameworks/pytorch/data/datasets.py
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
__post_init__
__post_init__() -> None
Resolve files eagerly and log basic configuration.
Source code in mlpotion/frameworks/pytorch/data/datasets.py
245 246 247 248 249 250 251 252 253 254 | |
mlpotion.frameworks.pytorch.data.loaders
Classes
CSVDataLoader
dataclass
Bases: Generic[T_co]
Factory for creating configured PyTorch DataLoaders.
This class simplifies the creation of torch.utils.data.DataLoader instances by
encapsulating common configuration options and handling differences between
map-style and iterable datasets (e.g., automatically disabling shuffling for iterables).
Attributes:
| Name | Type | Description |
|---|---|---|
batch_size |
int
|
Number of samples per batch. |
shuffle |
bool
|
Whether to shuffle the data (ignored for IterableDatasets). |
num_workers |
int
|
Number of subprocesses to use for data loading. |
pin_memory |
bool
|
Whether to copy tensors into CUDA pinned memory. |
drop_last |
bool
|
Whether to drop the last incomplete batch. |
persistent_workers |
bool | None
|
Whether to keep workers alive between epochs. |
prefetch_factor |
int | None
|
Number of batches loaded in advance by each worker. |
Example
from mlpotion.frameworks.pytorch import CSVDataLoader, CSVDataset
# 1. Create a dataset
dataset = CSVDataset("data.csv", label_name="target")
# 2. Configure the loader factory
loader_factory = CSVDataLoader(
batch_size=64,
shuffle=True,
num_workers=4,
pin_memory=True
)
# 3. Create the actual DataLoader
train_loader = loader_factory.load(dataset)
# 4. Use it
for X, y in train_loader:
...
Functions
load
load(
dataset: Dataset[T_co] | IterableDataset[T_co],
) -> DataLoader[T_co]
Load a configured :class:DataLoader from a dataset.
This method is aware of :class:IterableDataset vs map-style
:class:Dataset and will:
- Disable shuffling for iterable datasets (with a warning if
shuffle=Truewas requested). - Apply worker-related options only when valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
Dataset[T_co] | IterableDataset[T_co]
|
PyTorch :class: |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Configured |
DataLoader[T_co]
|
class: |
Source code in mlpotion/frameworks/pytorch/data/loaders.py
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 | |
CSVDataset
dataclass
Bases: Dataset[tuple[torch.Tensor, torch.Tensor] | torch.Tensor]
PyTorch Dataset for CSV files with on-demand tensor conversion.
This class loads CSV data into memory (using Pandas) and provides a map-style PyTorch Dataset. It supports filtering columns, separating labels, and efficient on-demand tensor conversion to minimize memory usage.
Attributes:
| Name | Type | Description |
|---|---|---|
file_pattern |
str
|
Glob pattern matching the CSV files to load. |
column_names |
list[str] | None
|
Specific columns to load. If None, all columns are loaded. |
label_name |
str | None
|
Name of the column to use as the label. If None, no labels are returned. |
dtype |
torch.dtype
|
The data type for the features (default: |
Example
from mlpotion.frameworks.pytorch import CSVDataset
from torch.utils.data import DataLoader
# Create dataset
dataset = CSVDataset(
file_pattern="data/train_*.csv",
label_name="target_class",
column_names=["feature1", "feature2", "target_class"]
)
# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
# Iterate
for features, labels in dataloader:
print(features.shape, labels.shape)
Functions
__getitem__
__getitem__(
idx: int,
) -> tuple[torch.Tensor, torch.Tensor] | torch.Tensor
Get item at index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx |
int
|
Global row index. |
required |
Returns:
| Type | Description |
|---|---|
tuple[torch.Tensor, torch.Tensor] | torch.Tensor
|
(features, label) tuple if labels exist, else just features. |
Source code in mlpotion/frameworks/pytorch/data/loaders.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
__len__
__len__() -> int
Return dataset length.
Source code in mlpotion/frameworks/pytorch/data/loaders.py
168 169 170 171 172 | |
__post_init__
__post_init__() -> None
Eagerly load CSV files into a DataFrame and validate configuration.
Source code in mlpotion/frameworks/pytorch/data/loaders.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
StreamingCSVDataset
dataclass
Bases: IterableDataset[tuple[torch.Tensor, torch.Tensor] | torch.Tensor]
Streaming PyTorch IterableDataset for large CSV files.
This dataset is designed for datasets that are too large to fit in memory. It reads CSV files
in chunks (using Pandas) and streams samples one by one. It is compatible with PyTorch's
IterableDataset interface.
Attributes:
| Name | Type | Description |
|---|---|---|
file_pattern |
str
|
Glob pattern matching the CSV files to load. |
column_names |
list[str] | None
|
Specific columns to load. |
label_name |
str | None
|
Name of the label column. |
chunksize |
int
|
Number of rows to read into memory at a time per file. |
dtype |
torch.dtype
|
The data type for the features. |
Example
from mlpotion.frameworks.pytorch import StreamingCSVDataset
from torch.utils.data import DataLoader
# Create streaming dataset
dataset = StreamingCSVDataset(
file_pattern="data/large_dataset_*.csv",
label_name="target",
chunksize=10000
)
# Create DataLoader (shuffle must be False for IterableDataset)
dataloader = DataLoader(dataset, batch_size=64)
for features, labels in dataloader:
# Train model...
pass
Functions
__iter__
__iter__() -> (
Iterator[
tuple[torch.Tensor, torch.Tensor] | torch.Tensor
]
)
Yield samples one by one across all CSV files.
Source code in mlpotion/frameworks/pytorch/data/loaders.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
__post_init__
__post_init__() -> None
Resolve files eagerly and log basic configuration.
Source code in mlpotion/frameworks/pytorch/data/loaders.py
252 253 254 255 256 257 258 259 260 261 | |
Training
mlpotion.frameworks.pytorch.training.trainers
PyTorch model training.
Classes
ModelTrainer
Bases: ModelTrainerProtocol[nn.Module, DataLoader]
Generic trainer for PyTorch models.
This class implements the ModelTrainerProtocol for PyTorch models. It handles the
training loop, device placement, loss calculation, backpropagation, and validation.
It supports:
- Supervised learning (batch is (inputs, targets)).
- Unsupervised/Self-supervised learning (batch is inputs only, loss is fn(outputs, inputs)).
- Custom loss functions (string alias, nn.Module, or callable).
- Automatic device management (CPU/GPU).
Attributes:
| Name | Type | Description |
|---|---|---|
model |
nn.Module
|
The PyTorch model to train. |
dataloader |
DataLoader
|
The training data loader. |
config |
ModelTrainingConfig
|
Configuration for training (epochs, optimizer, etc.). |
Example
import torch
import torch.nn as nn
from mlpotion.frameworks.pytorch import ModelTrainer
from mlpotion.frameworks.pytorch.config import ModelTrainingConfig
# Define model
model = nn.Linear(10, 1)
# Define config
config = ModelTrainingConfig(
epochs=5,
learning_rate=0.01,
optimizer="adam",
loss_fn="mse",
device="cpu"
)
# Initialize trainer
trainer = ModelTrainer()
# Train
result = trainer.train(model, train_loader, config, val_loader)
print(result.metrics)
Functions
train
train(
model: nn.Module,
dataloader: DataLoader[Any],
config: ModelTrainingConfig,
validation_dataloader: DataLoader[Any] | None = None,
) -> TrainingResult[nn.Module]
Train a PyTorch model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model |
nn.Module
|
The PyTorch model ( |
required |
dataloader |
DataLoader[Any]
|
The |
required |
config |
ModelTrainingConfig
|
A |
required |
validation_dataloader |
DataLoader[Any] | None
|
Optional |
None
|
Returns:
| Type | Description |
|---|---|
TrainingResult[nn.Module]
|
TrainingResult[nn.Module]: A dataclass containing the trained model, |
TrainingResult[nn.Module]
|
training history (loss/metrics per epoch), and final metrics. |
Raises:
| Type | Description |
|---|---|
TrainingError
|
If the training loop encounters an error (e.g., NaN loss, |
Source code in mlpotion/frameworks/pytorch/training/trainers.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
Evaluation
mlpotion.frameworks.pytorch.evaluation.evaluators
PyTorch model evaluation.
Classes
ModelEvaluator
Bases: ModelEvaluatorProtocol[nn.Module, DataLoader]
Generic evaluator for PyTorch models.
This class implements the ModelEvaluatorProtocol for PyTorch models. It performs
a full pass over the evaluation dataset, computing the average loss.
It supports: - Supervised and unsupervised evaluation. - Custom loss functions. - Automatic device management.
Example
from mlpotion.frameworks.pytorch import ModelEvaluator
from mlpotion.frameworks.pytorch.config import ModelEvaluationConfig
evaluator = ModelEvaluator()
config = ModelEvaluationConfig(loss_fn="cross_entropy", device="cuda")
result = evaluator.evaluate(model, test_loader, config)
print(f"Test Loss: {result.metrics['loss']}")
Functions
evaluate
evaluate(
model: nn.Module,
dataloader: DataLoader[Any],
config: ModelEvaluationConfig,
) -> EvaluationResult
Evaluate a PyTorch model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model |
nn.Module
|
The PyTorch model to evaluate. |
required |
dataloader |
DataLoader[Any]
|
The |
required |
config |
ModelEvaluationConfig
|
A |
required |
Returns:
| Name | Type | Description |
|---|---|---|
EvaluationResult |
EvaluationResult
|
A dataclass containing the computed metrics (e.g., average loss) |
EvaluationResult
|
and execution time. |
Raises:
| Type | Description |
|---|---|
EvaluationError
|
If evaluation fails. |
Source code in mlpotion/frameworks/pytorch/evaluation/evaluators.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
Persistence
mlpotion.frameworks.pytorch.deployment.persistence
PyTorch model persistence.
Classes
ModelPersistence
dataclass
Bases: ModelPersistenceProtocol[nn.Module]
Persistence helper for PyTorch models.
This class manages saving and loading of PyTorch models. It supports two modes:
1. State Dict (Recommended): Saves only the model parameters (model.state_dict()).
Requires the model class to be available when loading.
2. Full Model: Saves the entire model object using pickle. Less portable but easier to load.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
str | Path
|
The file path for the model artifact. |
model |
nn.Module | None
|
The PyTorch model instance. |
Example
Saving and Loading State Dict (Recommended):
from mlpotion.frameworks.pytorch import ModelPersistence
import torch.nn as nn
# Define model
class MyModel(nn.Module):
def __init__(self): super().__init__(); self.l = nn.Linear(1, 1)
model = MyModel()
# Save
saver = ModelPersistence(path="model.pth", model=model)
saver.save(save_full_model=False)
# Load
loader = ModelPersistence(path="model.pth")
# We must provide the model class or an instance for state_dict loading
loaded_model = loader.load(model_class=MyModel)
Example
Saving and Loading Full Model:
# Save
saver.save(save_full_model=True)
# Load (no model class needed)
loader = ModelPersistence(path="model.pth")
loaded_model = loader.load()
Attributes
path_obj
property
writable
path_obj: Path
Return the model path as a Path.
Functions
load
load(
*,
model_class: type[nn.Module] | None = None,
map_location: str | torch.device | None = "cpu",
strict: bool = True,
model_kwargs: dict[str, Any] | None = None,
**torch_load_kwargs: Any
) -> tuple[nn.Module, dict[str, Any] | None]
Load a PyTorch model from disk.
This method automatically detects if the file is a full model checkpoint or a state dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_class |
type[nn.Module] | None
|
The model class to instantiate if loading a state dict and no model instance is currently attached. |
None
|
map_location |
str | torch.device | None
|
Device to load the model onto (default: "cpu"). |
'cpu'
|
strict |
bool
|
Whether to strictly enforce state dict keys match the model. |
True
|
model_kwargs |
dict[str, Any] | None
|
Arguments to pass to |
None
|
**torch_load_kwargs |
Any
|
Additional arguments passed to |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[nn.Module, dict[str, Any] | None]
|
nn.Module: The loaded PyTorch model. |
Raises:
| Type | Description |
|---|---|
ModelPersistenceError
|
If loading fails, or if |
Source code in mlpotion/frameworks/pytorch/deployment/persistence.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
save
save(
*,
save_full_model: bool = False,
**torch_save_kwargs: Any
) -> None
Save the attached PyTorch model to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_full_model |
bool
|
If True, saves the entire model object (pickle).
If False (default), saves only the |
False
|
**torch_save_kwargs |
Any
|
Additional arguments passed to |
{}
|
Raises:
| Type | Description |
|---|---|
ModelPersistenceError
|
If no model is attached or saving fails. |
Source code in mlpotion/frameworks/pytorch/deployment/persistence.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
Export
mlpotion.frameworks.pytorch.deployment.exporters
Classes
ModelExporter
dataclass
Bases: ModelExporterProtocol[nn.Module]
Export PyTorch models to TorchScript, ONNX, or state_dict formats.
This class implements the ModelExporterProtocol for PyTorch. It supports exporting
models for deployment or interoperability.
Supported formats:
- torchscript: Exports via torch.jit.script or torch.jit.trace.
- onnx: Exports to ONNX format (requires example_input).
- state_dict: Saves the model parameters.
Example
from mlpotion.frameworks.pytorch import ModelExporter
from mlpotion.frameworks.pytorch.config import ModelExportConfig
import torch
# Prepare model and input
model = ...
example_input = torch.randn(1, 3, 224, 224)
# Export to ONNX
exporter = ModelExporter()
config = ModelExportConfig(
export_path="models/model.onnx",
format="onnx",
example_input=example_input
)
result = exporter.export(model, config)
Functions
export
export(
model: nn.Module, config: ModelExportConfig
) -> ExportResult
Export a PyTorch model to the specified format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model |
nn.Module
|
The PyTorch model to export. |
required |
config |
ModelExportConfig
|
Configuration object specifying format, path, and other options. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ExportResult |
ExportResult
|
A dataclass containing the path to the exported artifact and metadata. |
Raises:
| Type | Description |
|---|---|
ExportError
|
If the export process fails (e.g., invalid format, missing example input). |
Source code in mlpotion/frameworks/pytorch/deployment/exporters.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
See the PyTorch Guide for usage examples