stanford-crfm / mistral

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make sure we don't write anything (of significant size) to /tmp

dlwh opened this issue · comments

Per JohnT, hf datasets is apparently still dumping some stuff out to /tmp, which causes issues on codalab (which has a small /tmp).

my best guess is we're doing some transform on a dataset where we don't specify cache files, but I'm not sure where.

ok pretty sure this is done.