karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

Can't train in FP16 on Turing

jafioti opened this issue · comments

Hi,
I have a turing card (2080 super) and I'm trying to run training in FP16. I can't run in BF16 because the card doesn't have support for it, and when I try to run in FP16, I get build_from_checkpoint() does not support fp16 right now.. Is there any way to initialize the weights randomly instead of building from a checkpoint? My understanding is weight initialization right now is just handled by the python script, and the C file can only load a checkpoint.

I somewhat solved this by adding fp16 exporting to the python file. Now it works without cudnn (albeit with an increasing loss). With cudnn on, I get

W! CuDNN (v90300 75) function cudnnBackendFinalize() called:
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: userGraph->getEntranceNodesSize() != 2
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: numUserNodes != 5 && numUserNodes != 6
w! Time: 2024-08-24T17:17:17.366897 (0d+0h+0m+0s since start)
w! Process=30528; Thread=30528; GPU=NULL; Handle=NULL; StreamId=NULL.

[CUDNN ERROR] at file llmc/cudnn_att.cpp:137:
[cudnn_frontend] Error: No execution plans support the graph.