FP8 not working
prigoyal opened this issue · comments
Hello, this is follow-up of earlier issue we reported. We are unable to run the simple mpt-1b fp8 baseline.
We have done due diligence in identifying various compatible dependencies versions etc and are sharing below what we have tried. We also use the llm-foundry dockers and share our docker files , error logs, config files etc every detail. We would greatly appreciate any insight and what we are missing.
cc @growlix
Please scroll the columns to see the Build and Runtime status
llm-foundry | composer | pytorch | cuda | TransformerEngine | Flash-attn required | Flash-attn version used | Build | Runtime |
---|---|---|---|---|---|---|---|---|
0.3.0 | >=0.16.3, < 0.17 | 2.0.1 | 11.8 | v0.10 | >=1.0.6, <=1.0.7 | 1.0.7 | ||
v0.12 | >=1.0.6, <=2.0.4 | 1.0.7 | ✅ | ❌ (same error as job935 log) | ||||
stable | >= 1.0.6,<= 2.3.3,!= 2.0.9!= 2.1.0 | 1.0.7 | ✅ | ❌Different API error | ||||
0.4.0 | >=0.17, < 0.18 | 2.0.1 | 11.8 | main | >= 2.0.6,<= 2.4.2,!= 2.0.9!= 2.1.0 | 2.4.2 | ✅docker | ❌Different API error |
0.4.0 | >=0.17, < 0.18 | 2.0.1 | 11.8 | v0.10 | >=1.0.6, <=1.0.7 | 1.0.7 | ||
v0.12 | >=1.0.6, <=2.0.4 | 1.0.7 | ✅ docker, yaml config | ❌ job935 log (same issue we reported in #885) | ||||
0.4.0 | >=0.17, < 0.18 | 2.1.0 | 12.1 | v0.10 | >=1.0.6, <=1.0.7 | |||
v0.12 | >=1.0.6, <=2.0.4 | |||||||
main | >= 2.0.6,<= 2.4.2,!= 2.0.9!= 2.1.0 | 2.4.2 | ✅ docker | ❌ initial error log but resolved with init_device: cpu but hit new same error is job935 log. |
Hi @j316chuck , just flagging this follow-up issue if you can help!
Updating that we removed
model:
fc_type: te
ffn_config_defaults:
ffn_type: te_ln_mlp
which solved it for us.