2024 updates - GPU: A10 / L4 / A100 / H100

Question

szilard opened this issue 2 months ago · comments

On p3.2xlarge (V100) (current benchmark):

Tool	Time[s] 100K	Time[s] 1M	Time[s] 10M	AUC 1M	AUC 10M
h2o xgboost	6.4	14	42	0.749	0.756
xgboost	0.7	1.3	5	0.748	0.756
lightgbm	7	9	40	0.766	0.791
catboost	1.6	3.4	23	0.735 ?!	0.737 ?!

On g5.4xlarge (A10G) (newer, but lower-end GPU):

Tool	Time[s] 100K	Time[s] 1M	Time[s] 10M	AUC 1M	AUC 10M
h2o xgboost	6.8	14	32	0.749	0.756
xgboost	0.8	1.6	5	0.748	0.756
lightgbm	5	7	28	0.766	0.791
catboost	1.6	3.3	20	0.735 ?!	0.737 ?!

Roughly the same speed.

On g6.4xlarge (L4) (newer, but lower-end GPU):

Tool	Time[s] 100K	Time[s] 1M	Time[s] 10M	AUC 1M	AUC 10M
h2o xgboost	6.6	13	31	0.749	0.756
xgboost	1.4	1.9	6	0.748	0.756
lightgbm	3.7	5	22	0.766	0.791
catboost	2.6	3.6	25	0.735 ?!	0.737 ?!

Roughly the same speed.

On lambdalabs gpu_1x_a100_sxm4 (A100) (newer GPU):

Tool	Time[s] 100K	Time[s] 1M	Time[s] 10M	AUC 1M	AUC 10M
h2o xgboost	6.3	9	22	0.749	0.756
xgboost	0.7	1.3	3.7	0.748	0.756
lightgbm	6.7	12	27	0.766	0.791
catboost	1.8	3.2	15	0.735 ?!	0.737 ?!

Faster (1.4-1.9x) on the largest dataset, but about the same speed on the small/medium sized data.

On lambdalabs gpu_1x_h100_pcie (H100) (newest, most powerful GPU):

Tool	Time[s] 100K	Time[s] 1M	Time[s] 10M	AUC 1M	AUC 10M
h2o xgboost	6.3	10	20	0.749	0.756
xgboost	0.6	1.2	3.6	0.748	0.756
lightgbm	5.6	7	16	0.766	0.791
catboost	2.0	2.8	17	0.735 ?!	0.737 ?!

Faster (1.3-2x) on the largest dataset, but about the same speed on the small/medium sized data.

Szilard Pafka · Answer 1 · Sun Jun 09 2024 05:37:01 GMT+0800 (China Standard Time)

Summary XGBoost 10M size:

GPU specs (table by ChatGPT):

GPU Benchmark by LambdaLabs:

data from the bar plot:

LambdaCloud H100 80GB PCIe Gen5: ~5.2
LambdaCloud A100 40GB PCIe: ~3.5
LambdaCloud A10: ~1.3
LambdaCloud V100 16GB: 1.0 (reference)