MLPath

MLPath is an MLOPs library on Python that makes tracking machine learning experiments and organizing machine learning projects easier. It consists of two subpackages so far, MLQuest for tracking and MLDir for directory structure.

Check this for documentation and this for a full version of the quick start below.

💻 Installation

pip install mlpath

MLPath isn't "just another" machine learning tracking library:

Unlike other libraries, MLPath requires minimal boilerplate for tracking and infers hyperparameter names automatically
Does not restrict developers to using a web interface. Logs can be shown in the notebook itself!
Less abstraction: Logs can be treated as Pandas tables for additional operations or visualizations
Comes with MLDir which automatically generates and sets standards for directory structure such as to maximize organization and reproducibility
MLDir also makes it easier to wrap models that map files to outputs in a web interface

MLQuest

🚀 Quick Start

This is your code without mlquest

# Preprocessing
x_data_p = Preprocessing(x_data=[1, 2, 3], alpha=1024, beta_param=7, c=12)

# Feature Extraction
x_data_f = FeatureExtraction(x_data_p, 14, 510, 4)  

# Model Initialization
model = RadialBasisNet(x_data_f, 12, 2, 3)

# Model Training
accuracy = train_model(model)

This is your code with mlquest

# 1. Import the Package
from mlpath import mlquest as mlq
l = mlq.l

# 2. Start a new quest, this simply create a table or loads an existing one to log your next run
mlq.start_quest('Radial Basis Pipeline', log_defs=False)     

# 3. Wrap function calls to be logged with `l()`

# Preprocessing
x_data_p = l(Preprocessing)(x_data=[1, 2, 3], alpha=1114, beta_param=2, c=925)

# Feature Extraction
x_data_f = l(FeatureExtraction)(x_data_p, 32, 50, 4)  # x_data_p is an array so it won't be logged.

# Model Initialization
model = l(RadialBasisNet)(x_data_f, 99, 19, 31)

# Model Training
accuracy = train_model(model)

# 4. log any metrics if needed
mlq.log_metrics(accuracy)        # can also do mlq.log_metric(acc=accuracy) so its logged as acc

# 5. End the quest to push the experiment to the table and save as markdown at './'
mlq.end_quest('./')

# 6. View the table (only for notebooks)
mlq.show_logs(last_k=10)         # show the table for the last 10 runs

This results in the following after three runs shown below the cell in the notebook or the separate markdown file.

info				Preprocessing			FeatureExtraction			RadialBasisNet			metrics
time	date	duration	id	alpha	beta_param	c	x_param	y_param	z_param	p_num	k_num	l_num	accuracy
16:31:16	02/11/23	1.01 min	1	74	12	95	13	530	4	99	99	3	50
16:32:40	02/11/23	4.91 ms	2	14	2	95	132	530	4	99	19	3	70
16:32:57	02/11/23	4.93 ms	3	1114	2	925	32	50	4	99	19	31	70

Editors like VSCode support viewing markdown out-of-the-box. You may need to press CTRL/CMD+Shift+V. You can see a fuller version of this quick start in the documentation which corresponds to the Full-Example notebook found here which you can also run locally.

An example with Scikit-Learn

Check Example.ipynb or equivalently the following Colab notebook.

An Example with PyTorch

More examples with sci-kit learn and an example with PyTorch could be found by running mldir --example as will be illustrated down below.

🌐 A Web Interface is also Supported

Simply run mlq.run_server() after mlq.end_quest

⦿ You can search for specific runs, an example would be metrics.accuracy>50 (similar syntax to MLFlow)

⦿ You can customize the columns to show in the table by clicking on columns (in lieu of doing it throughjson config file)

MLDir

MLDir is a simple CLI that creates a standard directory structure for your machine learning project. It provides a folder structure that is comprehensive, highly scalable (development-wise) and apt for collaboration.

Note of caution

⦿ Although it integrates well with MLQuest, neither MLQuest nor MLDir require the other to function.

⦿ Suppose your project has very few people working on it (only you) or does not require trying many models with many other preprocessing methods and features, then you may not really need MLDir. A notebook and MLQuest should be enough. Otherwise use MLDir to prevent your directory from becoming a spaghetti soup of Python files.

📜 The MLDir Manifesto

The directory structure generated by MLDir complies with the MLDir manifesto ( a set of 'soft' standards) which attempts to enforce seperation of concerns among different stages of the machine learning pipeline and among writing code and running experiments (hyperparameter tuning). We recommend that you read more about the manifesto here.

To get started

MLDir is part of MLPath. So you don't need to install it separately. To create a simple folder structure, run:

mldir --name <project_name>

⦿ If mldir is ran without a name, it uses the name 'Project'

This generates the following folder structure (with dummy names for features and models):

.
├── DataPreparation
│   ├── Ingestion.py
│   └── Preprocessing.py
├── FeatureExtraction
│   ├── BoW
│   │   └── BoW.py
│   ├── GLCM
│   │   └── GLCM.py
│   └── OneHot
│       └── OneHot.py
├── GIT-README.md
├── ModelPipelines
│   ├── GRU
│   │   └── OneHot-GRU.ipynb
│   ├── GradientBoost
│   │   ├── BoW-GB.ipynb
│   │   └── GLCM-GB.ipynb
│   └── SVM
│       └── BoW-SVM.ipynb
├── ModelScoring
│   ├── Pipeline.py
│   └── Scoring.py
├── README.md
└── Sandbox.ipynb

The file in each folder has instructions on how to use it. These are all grouped in the README.md for a more detailed explanation.

Other important options

mldir --name<project-name> --full

⦿ The --full option generates an even more comprehensive folder structure. Including folders such as ModelImplementations, References and most importantly Production.

⦿ The Production folder contains a Flask app that can be used to serve your model as an API. All you need is only to import your final model into app.py and replace the dummy model with it. The Flask app assumes that your model takes a file via path and returns a prediction but it can be easily extended otherwise to suit your needs

🚢 Complete Example (MLQuest + MLDir)

mldir --name <project-name>  --example

⦿ The --example option generates a complete example on a tiny dataset (and real models) that should be helpful for you to understand more about the folder structure and how to use it (e.g., you can use it as a template for your own project).