ylqfp / DistML

DistML provide a supplement to mllib to support model-parallel on Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DistML (Distributed Machine Learning platform)

DistML is a machine learning tool which allows traing very large models on Spark, it's fully compatible with Spark (tested on 1.2 or above).

Reference paper: Large Scale Distributed Deep Networks

Runtime view:

DistML provides several algorithms (LR, LDA, Word2Vec, ALS) to demonstrate its scalabilites, however, you may need to write your own algorithms based on DistML APIs(Model, Session, Matrix, DataStore...), generally, it's simple to extend existed algorithms to DistML, here we take LR as an example: How to implement logistic regression on DistML.

User Guide

  1. Download and build DistML.
  2. Typical options.
  3. Run Sample - LR.
  4. Run Sample - MLR.
  5. Run Sample - LDA.
  6. Run Sample - Word2Vec.
  7. Run Sample - ALS.
  8. Benchmarks.
  9. FAQ.

API Document

  1. Source Tree.
  2. DistML API.

Contributors

He Yunlong (Intel)
Sun Yongjie (Intel)
Liu Lantao (Intern, Graduated)
Hao Ruixiang (Intern, Graduated)

About

DistML provide a supplement to mllib to support model-parallel on Spark

License:Other


Languages

Language:Java 55.0%Language:Scala 44.9%Language:Shell 0.1%