CoInitialized / Ember

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ember - Automated Gradient Boosting Framework

Work still in progress. When first version is ready - the package will be published to PyPi

What is it

Ember is a project that aims to provide developer with easy to use interface to create machine learning pipelines. From preprocessing to final optimized model.

It is based on scikit-learn and gradient boosting frameworks like:

It also uses some other libraries like

To your disposal are:

How to use it

Currently you need to clone the repository or download it as zip and then include the code in project manually. It should change soon so you will be able to use PyPi package.

Submodules example:

In most cases you would like to use independent modules for quickly assembling your model. This process is inspired by the way keras works.

from ember.preprocessing import Preprocessor
from ember.utils import DtypeSelector
from ember.optimize import GridSelector, BayesSelector
from sklearn.pipeline import make_pipeline
import pandas as pd
import numpy as np
from ember.impute import GeneralImputer
from ember.preprocessing import GeneralEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

dataset_classification = r'Ember\datasets\classification\autos.csv'

data = pd.read_csv(dataset_classification)

X, y = data.drop(columns = ['class']), data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

preprocessor = Preprocessor()
preprocessor.add_branch("categorical")
preprocessor.add_branch('numerical')

preprocessor.add_transformer_to_branch("numerical", DtypeSelector(np.number))
preprocessor.add_transformer_to_branch("numerical", GeneralImputer('Simple'))

preprocessor.add_transformer_to_branch("categorical", DtypeSelector(np.object))
preprocessor.add_transformer_to_branch("categorical", GeneralImputer('Simple', strategy='most_frequent'))
preprocessor.add_transformer_to_branch("categorical", GeneralEncoder(kind = 'LE'))

final = preprocessor.merge()

model = GridSelector('classification')

clf_pipe = make_pipeline(final, model) 
clf_pipe.fit(X_train, y_train)

print(accuracy_score(y_test, clf_pipe.predict(X_test)))

As you can see creating whole pipeline is very simple. If you were to skip imports it would be many times shorter than equivalent written in standard way.

Automatic learner example

You also have an option to completely automate the process. In this case you can use Learner class from autolearn module.

from ember.autolearn import Learner
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, r2_score

dataset_regression = r'Ember\datasets\regression\auto-price.csv'

data = pd.read_csv(dataset_regression)
X, y = data.drop(columns = ['class']), data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, test_size = 20)
learner = Learner(objective='regression', X = X_train, y = y_train)
learner.fit(cv = 5, optimizer = 'bayes', cat=False, speed=100)


results = learner.predict(X_test)
print(r2_score(y_test, results))

The way Learner creates and optimizes model can be controlled by providing different hyperparameters to its fit() method.

Web-based GUI for AutoML functionality

If you wish to use Embers AutoML functionality in your web browser, you can run command:

streamlit run streamlit_app.py

It will open server on localhost with simple application providing beforementioned functionality.

Further plans

  • More clearly comment modules - closer following sklearn docs convention
  • Extend optimization module so it can be used in more general cases
  • Build visualization module
  • Build automatic dashboard
  • Sphinx docs
  • Publish as PyPi package

About


Languages

Language:Python 61.7%Language:Jupyter Notebook 38.3%