FP8 mixed precision via nvidia's Transformer Engine

Question

FP8 mixed precision via nvidia's Transformer Engine

carmocca opened this issue a year ago · comments

Carlos Mocholí commented a year ago

Description & Motivation

Support https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Pitch

Write a precision plugin using the library above that is enabled via:

precision="transformer-engine"

Alternatives

Don't implement this until it's vendored by PyTorch, if that ever happens.

Additional context

No response

cc @Borda @carmocca @justusschock @awaelchli

Adrian Wälchli · Answer 1 · Wed May 10 2023 09:02:36 GMT+0800 (China Standard Time)

The library only requires enabling an autocast context manager

There is one more thing. The user needs to replace their layers with the custom ones from the library. What's the plan here? Will the plugin implement the module_init_context() manager? On the other hand, one might not want to replace all layers. If this is left to the user, then there is a lot less value in adding the plugin.

Carlos Mocholí · Answer 2 · Wed May 10 2023 10:40:23 GMT+0800 (China Standard Time)

Yes, we'll need to implement a replacement mechanism. The plugin can have a flag to disable it if necessary

This also means that we'll have it in Fabric first, as these APIs do not exist in the trainer yet.

Carlos Mocholí · Answer 3 · Wed May 10 2023 11:22:00 GMT+0800 (China Standard Time)

Actually convert_module might be a better fit than init_context if we prefer replacing existing layers than patching the torch.nn classes.

nanand2 · Answer 4 · Tue Jun 20 2023 05:02:27 GMT+0800 (China Standard Time)

Any update on support for this?

Carlos Mocholí · Answer 5 · Tue Jun 20 2023 06:40:41 GMT+0800 (China Standard Time)

@nanand2 Our access to H100s is very limited so we haven't merged this yet. However, the branch https://github.com/Lightning-AI/lightning/tree/carmocca/transformer-engine should be usable if you want to play with it right now

nanand2 · Answer 6 · Tue Jun 20 2023 06:46:35 GMT+0800 (China Standard Time)

Great, thanks!