The accuracy of random forest is over 0.99

Question

The accuracy of random forest is over 0.99

mk123qwe opened this issue 5 years ago · comments

I fit the easy random forest model，just like this
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(n_estimators=10, random_state=2019)

TABLE IV. High-level features in yours paper

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 16 out of 16 | elapsed: 3.2min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 16 out of 16 | elapsed: 1.2s finished
Validation Accuracy: 0.996

Javier Duarte · Answer 1 · Tue Dec 31 2019 02:53:49 GMT+0800 (China Standard Time)

How much of the training data did you use?

I tried just using 1 file (~20k events, so granted it might not be enough), and only got up to ~80% test accuracy. On the other hand, if I use the training data, then it's >99% accuracy.

How is the validation accuracy defined here?

My code here: https://github.com/jmduarte/HiggsToBBMachineLearning/blob/randomforest/train.ipynb
Binder link: https://mybinder.org/v2/gh/jmduarte/HiggsToBBMachineLearning/randomforest?filepath=train.ipynb

Thanks,
Javier