[transformers/slimpajama] where is lm_dataformat?
tbarton16 opened this issue · comments
I am trying to deduplicate some data. When I run main.py
I get an error ModuleNotFoundError: No module named 'lm_dataformat'
And looking around the module, for the public version, lm_dataformat
is not included. I would appreciate help with this.
Hi @tbarton16 , thanks for catching this issue.. we'll update our requirements.txt
. Meanwhile, did you try pip install lm_dataformat
in your environment?
I installed lm_dataformat. I can run python -c 'import lm_datformat'. Running main still produces ModuleNotFoundError: No module named 'lm_dataformat.lm_dataformat'
Unless you have a different version the line should be from lm_dataformat import Reader
not https://github.com/Cerebras/modelzoo/blob/main/modelzoo/transformers/data_processing/slimpajama/preprocessing/filter.py#L14
Hi @tbarton16, can you run git clone git@github.com:leogao2/lm_dataformat.git
in the directory https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/data_processing/slimpajama. It should resolve your issue.
Thanks.