Auto precision
rasbt opened this issue · comments
One small issue I see with the current config files is that we are using bf16-true
. This is recommended in my opinion, but certain hardware doesn't support it. In this case we could recommend using --precision 16-true
from the command line. However, maybe we could have an "auto" argument in the config files similar to Ollama. I think we currently already support that via
Line 284 in f241d94
We would just need to say null
in the config file and then maybe specify that bf16-true is used when supported and otherwise 16-true?
This will be an issue for reproducibility. It's not guaranteed that the training will give the same results or be stable. I recommend running first to show that it converges well.
To avoid having an ambiguous "null" value, we could also error and tell the user to explicitly select precision=16-true
if bfloat is not supported.
That's fair, we would have to run the script with both fp16 and bf16. But this is not that different from saying "if your GPU does not support --precision bf16-true
run the script with --precision 16-true
".
Maybe we should add something like:
"If your GPU is not compatible with --precision bf16-true, you can execute the script using --precision 16-true instead. However, be aware that this adjustment may lead to a decline in performance and the outcomes may vary from the reported results."