wrong MFU with blackwell
edisonchan opened this issue · comments
USE_CUDNN=1 make all
./train_gpt2cu
---
step 61/74 | loss 3.208939 (+nanz)| norm 1.4800 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91886 tok/s
step 62/74 | loss 3.464926 (+nanz)| norm 1.3662 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91886 tok/s
step 63/74 | loss 3.402215 (+nanz)| norm 1.3542 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91886 tok/s
step 64/74 | loss 3.407495 (+nanz)| norm 1.2991 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91887 tok/s
step 65/74 | loss 3.596000 (+nanz)| norm 1.4701 (+nanz)| lr 3.00e-04 | 44.55 ms | -100.0% bf16 MFU | 91890 tok/s
step 66/74 | loss 3.038379 (+nanz)| norm 1.3047 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91889 tok/s
step 67/74 | loss 3.288985 (+nanz)| norm 1.1935 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91890 tok/s
step 68/74 | loss 3.651558 (+nanz)| norm 1.3012 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91890 tok/s
step 69/74 | loss 3.298503 (+nanz)| norm 1.2295 (+nanz)| lr 3.00e-04 | 44.60 ms | -100.0% bf16 MFU | 91886 tok/s
step 70/74 | loss 3.651726 (+nanz)| norm 1.4792 (+nanz)| lr 3.00e-04 | 44.59 ms | -100.0% bf16 MFU | 91886 tok/s
step 71/74 | loss 3.597191 (+nanz)| norm 1.1836 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91886 tok/s
step 72/74 | loss 3.750491 (+nanz)| norm 2.1341 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91885 tok/s
step 73/74 | loss 3.828187 (+nanz)| norm 1.2098 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91885 tok/s
step 74/74 | loss 3.364620 (+nanz)| norm 1.2288 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91885 tok/s
GPU: RTX 5080(and other RTX 50 GPUs ).
fixed:
add below lines to MFU.h:
static const PerfData BLACKWELL_CONSUMER = {74.2f, 148.3f, 148.3f, 296.6f, 593.3f, 593.3f, 1704.f, 680.f};
static GPUEntry gpu_db{
...
{"NVIDIA GeForce RTX 5090", &BLACKWELL_CONSUMER, 680, 2407},
{"NVIDIA GeForce RTX 5090 D", &BLACKWELL_CONSUMER, 680, 1704},
{"NVIDIA GeForce RTX 5080", &BLACKWELL_CONSUMER, 336, 2617},
{"NVIDIA GeForce RTX 5070 Ti", &BLACKWELL_CONSUMER, 280, 2452},
...
};