Angel-ML / sona

Spark On Angel, arming Spark with a powerful Parameter Server, which enable Spark to train very big models

Repository from Github https://github.comAngel-ML/sonaRepository from Github https://github.comAngel-ML/sona

More efficient data loading for GBDT

ccchengff opened this issue · comments

The binning process in GBDT requires to cache two copies of datasets, which is inefficient for memory-limited users.

Implement a two-phase data loading method, which is more memory efficient.