Improve dataloading.
GeorgiosSmyrnis opened this issue · comments
GeorgiosSmyrnis commented
Some items that need to be addressed:
- Clean up the code in
data.py
. - Make
--dataset-resampled
and--dataset-manifest
the only possible options. - Make
--accurate-total-tokens
the default.
Achal Dave commented
Should we close this @GeorgiosSmyrnis? :)
GeorgiosSmyrnis commented
Improvement is a continuous process :)
But I agree, closing this thanks to #111 being merged.