Lightning-AI / litgpt

One small issue I see with the current config files is that we are using bf16-true. This is recommended in my opinion, but certain hardware doesn't support it. In this case we could recommend using --precision 16-true from the command line. However, maybe we could have an "auto" argument in the config files similar to Ollama. I think we currently already support that via

litgpt/lit_gpt/utils.py

Line 284 in f241d94

def get_default_supported_precision(training: bool) -> str:

We would just need to say null in the config file and then maybe specify that bf16-true is used when supported and otherwise 16-true?

This will be an issue for reproducibility. It's not guaranteed that the training will give the same results or be stable. I recommend running first to show that it converges well.

To avoid having an ambiguous "null" value, we could also error and tell the user to explicitly select precision=16-true if bfloat is not supported.

That's fair, we would have to run the script with both fp16 and bf16. But this is not that different from saying "if your GPU does not support --precision bf16-true run the script with --precision 16-true".

Maybe we should add something like:

"If your GPU is not compatible with --precision bf16-true, you can execute the script using --precision 16-true instead. However, be aware that this adjustment may lead to a decline in performance and the outcomes may vary from the reported results."

Auto precision