huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`nanotron/the-pile-for-doremi` is empty

Tonyhao96 opened this issue · comments

https://huggingface.co/datasets/nanotron/the-pile-for-doremi/tree/main

I cannot download the dataset, could you please check it?

@Tonyhao96 Hey. There was a bug in our DoReMi's implementation, we have fixed it [link] (not merged yet) and reran the experiment on Fineweb2 (a up comming release dataset) and a few other domains from the pile, and the stack... So the previous experiment results are not legit. Stay tune for the new release!!

For the new experiment result, check the last images of this tweet: https://twitter.com/xariusrke/status/1774089131351584852