karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

wrong MFU with blackwell

edisonchan opened this issue · comments

USE_CUDNN=1 make all

./train_gpt2cu

---
step   61/74 | loss 3.208939 (+nanz)| norm 1.4800 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91886 tok/s
step   62/74 | loss 3.464926 (+nanz)| norm 1.3662 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91886 tok/s
step   63/74 | loss 3.402215 (+nanz)| norm 1.3542 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91886 tok/s
step   64/74 | loss 3.407495 (+nanz)| norm 1.2991 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91887 tok/s
step   65/74 | loss 3.596000 (+nanz)| norm 1.4701 (+nanz)| lr 3.00e-04 | 44.55 ms | -100.0% bf16 MFU | 91890 tok/s
step   66/74 | loss 3.038379 (+nanz)| norm 1.3047 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91889 tok/s
step   67/74 | loss 3.288985 (+nanz)| norm 1.1935 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91890 tok/s
step   68/74 | loss 3.651558 (+nanz)| norm 1.3012 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91890 tok/s
step   69/74 | loss 3.298503 (+nanz)| norm 1.2295 (+nanz)| lr 3.00e-04 | 44.60 ms | -100.0% bf16 MFU | 91886 tok/s
step   70/74 | loss 3.651726 (+nanz)| norm 1.4792 (+nanz)| lr 3.00e-04 | 44.59 ms | -100.0% bf16 MFU | 91886 tok/s
step   71/74 | loss 3.597191 (+nanz)| norm 1.1836 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91886 tok/s
step   72/74 | loss 3.750491 (+nanz)| norm 2.1341 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91885 tok/s
step   73/74 | loss 3.828187 (+nanz)| norm 1.2098 (+nanz)| lr 3.00e-04 | 44.57 ms | -100.0% bf16 MFU | 91885 tok/s
step   74/74 | loss 3.364620 (+nanz)| norm 1.2288 (+nanz)| lr 3.00e-04 | 44.58 ms | -100.0% bf16 MFU | 91885 tok/s

GPU: RTX 5080(and other RTX 50 GPUs ).

fixed:
add below lines to MFU.h:

static const PerfData BLACKWELL_CONSUMER = {74.2f, 148.3f, 148.3f, 296.6f, 593.3f, 593.3f, 1704.f, 680.f};

static GPUEntry gpu_db{
...
{"NVIDIA GeForce RTX 5090", &BLACKWELL_CONSUMER, 680, 2407},
{"NVIDIA GeForce RTX 5090 D", &BLACKWELL_CONSUMER, 680, 1704},
{"NVIDIA GeForce RTX 5080", &BLACKWELL_CONSUMER, 336, 2617},
{"NVIDIA GeForce RTX 5070 Ti", &BLACKWELL_CONSUMER, 280, 2452},
...
};