HYPERPARAMETER OPTIMIZATION USING RAYTUNE AND RAPIDS

Hyperparameter optimization is a method used to enhance the accuracy of a model. Hyperparameter tuning can make the difference between an average model and a highly accurate one. The goal of this model was to predict the value of a football player by using random forest on gpu and optimized the accuray of the prediciton based on features such as rating, skill rate, work rate, attacking rate, position and etc.

Link to view the Data

Rapids is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Imagine scikit-learn on steroids, that is rapids.

Without HPO Random Forest -> No-HPO (Training on GPU)

Without HPO Random Forest

Running default setting on Random Forest

n_estimator = 100, max_depth = 16, max_bins = 8

Accuracy -> R2 score = 75%

Goal is to tune the model and enhance the accuracy without doing any feature engineering

Hence comes the ray tune module into the picture. Ray.Tune

HPO-code

Ray Tune Config

number of samples = 10, number of folds = 3, range for n_estimators = 500 - 1500, range for max_depth = 10 - 20, range for max_features = 0.5 - 1.0, n_bins = 18

Configuration that is done here are restricted as the machine used for this experiment is only running on GTX1060. If you have more powerful GPU, then you may have wider range of configuration to test.

Ray will randomly select any value from the range and add into the model as hyperparameter

self.rf_model = curfc(
                n_estimators=self._model_params["n_estimators"],
                max_depth=self._model_params["max_depth"],
                n_bins=self._model_params["n_bins"],
                max_features=self._model_params["max_features"],
            )

Total of 300 samples executed, but some iterations stopped early due to early stopping conditions.

+---------------------+------------+-------+-------------+----------------+----------------+--------+------------------+
| Trial name          | status     | loc   |   max_depth |   max_features |   n_estimators |   iter |   total time (s) |
|---------------------+------------+-------+-------------+----------------+----------------+--------+------------------|
| WrappedTrainable_1  | TERMINATED |       |     13.7454 |       0.975357 |       1231.99  |      3 |         234.135  |
| WrappedTrainable_2  | TERMINATED |       |     15.9866 |       0.578009 |        655.995 |      1 |          35.5271 |
| WrappedTrainable_3  | TERMINATED |       |     10.5808 |       0.933088 |       1101.12  |      1 |          58.8539 |
| WrappedTrainable_4  | TERMINATED |       |     17.0807 |       0.510292 |       1469.91  |      1 |          98.2842 |
| WrappedTrainable_5  | TERMINATED |       |     18.3244 |       0.60617  |        681.825 |      3 |         180.687  |
| WrappedTrainable_6  | TERMINATED |       |     11.834  |       0.652121 |       1024.76  |      3 |         124.095  |
| WrappedTrainable_7  | TERMINATED |       |     14.3195 |       0.645615 |       1111.85  |      3 |         149.505  |
| WrappedTrainable_8  | TERMINATED |       |     11.3949 |       0.646072 |        866.362 |      1 |          36.0093 |
| WrappedTrainable_9  | TERMINATED |       |     14.5607 |       0.892588 |        699.674 |      1 |          43.3045 |
| WrappedTrainable_10 | TERMINATED |       |     15.1423 |       0.796207 |        546.45  |      3 |         112.048  |
+---------------------+------------+-------+-------------+----------------+----------------+--------+------------------+

Output for all parameters are stored in trials.csv

The best performing parameters were experiment number 6:

max_depth=11, max_features=0.6, n_estimators=1024

With these hyperparameters the model accuracy increased to 83%. As you can see by finding better parameters we can make the model more accurate, now all is left is to work on feature engineering and re run the HPO to increase the accuracy more.

Those interested to try and run this

Install rapids
Install ray pip install 'ray[tune]' torch torchvision
Clone this repo
Run python/python3 random_forest_hpo.py

fadilparves / RAPIDS_RANDOM_FOREST_HPO

HYPERPARAMETER OPTIMIZATION USING RAYTUNE AND RAPIDS

Without HPO Random Forest

Goal is to tune the model and enhance the accuracy without doing any feature engineering

Ray Tune Config

Those interested to try and run this

References

Contributor

About

Languages