szilard / GBM-perf

Performance of various open source GBM implementations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

2024 updates - GPU: A10 / L4 / A100 / H100

szilard opened this issue · comments

On p3.2xlarge (V100) (current benchmark):

Tool Time[s] 100K Time[s] 1M Time[s] 10M AUC 1M AUC 10M
h2o xgboost 6.4 14 42 0.749 0.756
xgboost 0.7 1.3 5 0.748 0.756
lightgbm 7 9 40 0.766 0.791
catboost 1.6 3.4 23 0.735 ?! 0.737 ?!

On g5.4xlarge (A10G) (newer, but lower-end GPU):

Tool Time[s] 100K Time[s] 1M Time[s] 10M AUC 1M AUC 10M
h2o xgboost 6.8 14 32 0.749 0.756
xgboost 0.8 1.6 5 0.748 0.756
lightgbm 5 7 28 0.766 0.791
catboost 1.6 3.3 20 0.735 ?! 0.737 ?!

Roughly the same speed.

On g6.4xlarge (L4) (newer, but lower-end GPU):

Tool Time[s] 100K Time[s] 1M Time[s] 10M AUC 1M AUC 10M
h2o xgboost 6.6 13 31 0.749 0.756
xgboost 1.4 1.9 6 0.748 0.756
lightgbm 3.7 5 22 0.766 0.791
catboost 2.6 3.6 25 0.735 ?! 0.737 ?!

Roughly the same speed.

On lambdalabs gpu_1x_a100_sxm4 (A100) (newer GPU):

Tool Time[s] 100K Time[s] 1M Time[s] 10M AUC 1M AUC 10M
h2o xgboost 6.3 9 22 0.749 0.756
xgboost 0.7 1.3 3.7 0.748 0.756
lightgbm 6.7 12 27 0.766 0.791
catboost 1.8 3.2 15 0.735 ?! 0.737 ?!

Faster (1.4-1.9x) on the largest dataset, but about the same speed on the small/medium sized data.

On lambdalabs gpu_1x_h100_pcie (H100) (newest, most powerful GPU):

Tool Time[s] 100K Time[s] 1M Time[s] 10M AUC 1M AUC 10M
h2o xgboost 6.3 10 20 0.749 0.756
xgboost 0.6 1.2 3.6 0.748 0.756
lightgbm 5.6 7 16 0.766 0.791
catboost 2.0 2.8 17 0.735 ?! 0.737 ?!

Faster (1.3-2x) on the largest dataset, but about the same speed on the small/medium sized data.

Summary XGBoost 10M size:

GPU Time [s] Speedup
V100 5 base
A10 5 1x
L4 6 1.2x slower
A100 3.7 1.35x faster
H100 3.6 1.38x faster

GPU specs (table by ChatGPT):

Screenshot 2024-06-09 at 11 46 20 AM

GPU Benchmark by LambdaLabs:

Screenshot 2024-06-09 at 3 17 24 PM

data from the bar plot:

LambdaCloud H100 80GB PCIe Gen5: ~5.2
LambdaCloud A100 40GB PCIe: ~3.5
LambdaCloud A10: ~1.3
LambdaCloud V100 16GB: 1.0 (reference)