2024 updates - GPU: A10 / L4 / A100 / H100
szilard opened this issue · comments
On p3.2xlarge (V100) (current benchmark):
Tool | Time[s] 100K | Time[s] 1M | Time[s] 10M | AUC 1M | AUC 10M |
---|---|---|---|---|---|
h2o xgboost | 6.4 | 14 | 42 | 0.749 | 0.756 |
xgboost | 0.7 | 1.3 | 5 | 0.748 | 0.756 |
lightgbm | 7 | 9 | 40 | 0.766 | 0.791 |
catboost | 1.6 | 3.4 | 23 | 0.735 ?! | 0.737 ?! |
On g5.4xlarge (A10G) (newer, but lower-end GPU):
Tool | Time[s] 100K | Time[s] 1M | Time[s] 10M | AUC 1M | AUC 10M |
---|---|---|---|---|---|
h2o xgboost | 6.8 | 14 | 32 | 0.749 | 0.756 |
xgboost | 0.8 | 1.6 | 5 | 0.748 | 0.756 |
lightgbm | 5 | 7 | 28 | 0.766 | 0.791 |
catboost | 1.6 | 3.3 | 20 | 0.735 ?! | 0.737 ?! |
Roughly the same speed.
On g6.4xlarge (L4) (newer, but lower-end GPU):
Tool | Time[s] 100K | Time[s] 1M | Time[s] 10M | AUC 1M | AUC 10M |
---|---|---|---|---|---|
h2o xgboost | 6.6 | 13 | 31 | 0.749 | 0.756 |
xgboost | 1.4 | 1.9 | 6 | 0.748 | 0.756 |
lightgbm | 3.7 | 5 | 22 | 0.766 | 0.791 |
catboost | 2.6 | 3.6 | 25 | 0.735 ?! | 0.737 ?! |
Roughly the same speed.
On lambdalabs gpu_1x_a100_sxm4 (A100) (newer GPU):
Tool | Time[s] 100K | Time[s] 1M | Time[s] 10M | AUC 1M | AUC 10M |
---|---|---|---|---|---|
h2o xgboost | 6.3 | 9 | 22 | 0.749 | 0.756 |
xgboost | 0.7 | 1.3 | 3.7 | 0.748 | 0.756 |
lightgbm | 6.7 | 12 | 27 | 0.766 | 0.791 |
catboost | 1.8 | 3.2 | 15 | 0.735 ?! | 0.737 ?! |
Faster (1.4-1.9x) on the largest dataset, but about the same speed on the small/medium sized data.
On lambdalabs gpu_1x_h100_pcie (H100) (newest, most powerful GPU):
Tool | Time[s] 100K | Time[s] 1M | Time[s] 10M | AUC 1M | AUC 10M |
---|---|---|---|---|---|
h2o xgboost | 6.3 | 10 | 20 | 0.749 | 0.756 |
xgboost | 0.6 | 1.2 | 3.6 | 0.748 | 0.756 |
lightgbm | 5.6 | 7 | 16 | 0.766 | 0.791 |
catboost | 2.0 | 2.8 | 17 | 0.735 ?! | 0.737 ?! |
Faster (1.3-2x) on the largest dataset, but about the same speed on the small/medium sized data.
Summary XGBoost 10M size:
GPU | Time [s] | Speedup |
---|---|---|
V100 | 5 | base |
A10 | 5 | 1x |
L4 | 6 | 1.2x slower |
A100 | 3.7 | 1.35x faster |
H100 | 3.6 | 1.38x faster |
GPU specs (table by ChatGPT):
GPU Benchmark by LambdaLabs:
data from the bar plot:
LambdaCloud H100 80GB PCIe Gen5: ~5.2
LambdaCloud A100 40GB PCIe: ~3.5
LambdaCloud A10: ~1.3
LambdaCloud V100 16GB: 1.0 (reference)