MindAlpha

MindAlpha is a machine learning platform integrating PySpark, PyTorch and a parameter server implementation. The platform contains native support for sparse parameters, making it easy for users to develop large-scale models. Together with MindAlpha Serving, the platform provides a one-stop solution for data preprocessing, model training and online prediction.

Features

Efficient IO with PySpark. Minibatches read by PySpark as pandas DataFrames can be feed directly to models.
Similar API with PyTorch and Spark MLlib, users familar with PyTorch and PySpark can get started quickly.
Wrap custom sparse layers as PyTorch modules, making them easy to use. Those sparse layers can contain billions of parameters.
Models can be developed in Jupyter Notebook interactively and periodical model training can be scheduled by Airflow.
The trained model can be exported via one method call and loaded by MindAlpha Serving for online prediction.

Build

Firstly, run script to build a docker image

sh run_build.sh -i

For more details, please refer to docker/ubuntu20.04/Dockerfile and docker/centos7/Dockerfile.

and run script to compile sources(*cpp && py) to get dynamic-link library (*.so) and python install packages (*.whl) which will generate at directory build by default.

sh run_build.sh -m

Tutorials

Two tutorials are given:

MindAlpha Getting Started introduces the basic API of MindAlpha briefly.
MindAlpha Tutorial shows how to use MindAlpha in the production environment.

About

Apache License 2.0

Languages

Language:C++ 53.0%Language:Python 33.7%Language:Dockerfile 6.1%Language:Jupyter Notebook 4.4%Language:CMake 1.4%Language:Shell 1.1%Language:Thrift 0.2%Language:C 0.1%