dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Home Page:https://xgboost.readthedocs.io/en/stable/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RoadMap

tqchen opened this issue · comments

Follow up on #574 This will serve as roadmap issue for Year 2016

This issue will be the centralized place for links to on going proposals and plans on improving xgboost. Please reply to the issue for discussion. The major goals and specifications will be marked as Roadmap Label.

Distributed Version

  • Distributed python version of xgboost, enable features such as custom objective function
  • XGBoost JVM Package.

Data Frame Integration

External Memory

  • Being able to use external memory version in language packages, R/python/Julia from native data structure

distributed python is included in this PR #897

not tried this yet, but this iterated bagging seems to me very similar to gradient boosting and paper shows better performance

http://www.cs.utexas.edu/~ml/papers/bv-ecml-05.pdf

But I understand this is actually little bit different algorithm... something between boosting and randomforest...
I have no insight at deep code of xgboost (out of my skill), and not sure how difficult this can be...
But another view: bagging can be paralelize at higher level = building several trees at same time.. so I guess this can be even faster...

But sure, it can be a lot of work for maybe small difference...??

Proposal 1

I believe that this was the Roadmap for last year, is there any update on it ?