Can't train in FP16 on Turing
jafioti opened this issue · comments
Hi,
I have a turing card (2080 super) and I'm trying to run training in FP16. I can't run in BF16 because the card doesn't have support for it, and when I try to run in FP16, I get build_from_checkpoint() does not support fp16 right now.. Is there any way to initialize the weights randomly instead of building from a checkpoint? My understanding is weight initialization right now is just handled by the python script, and the C file can only load a checkpoint.
I somewhat solved this by adding fp16 exporting to the python file. Now it works without cudnn (albeit with an increasing loss). With cudnn on, I get
W! CuDNN (v90300 75) function cudnnBackendFinalize() called:
w! Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: userGraph->getEntranceNodesSize() != 2
w! Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: numUserNodes != 5 && numUserNodes != 6
w! Time: 2024-08-24T17:17:17.366897 (0d+0h+0m+0s since start)
w! Process=30528; Thread=30528; GPU=NULL; Handle=NULL; StreamId=NULL.
[CUDNN ERROR] at file llmc/cudnn_att.cpp:137:
[cudnn_frontend] Error: No execution plans support the graph.