prepare_load.py dosen't filter the len(input_ids) < chunk_size data like dataset.py
dumpmemory opened this issue · comments
I found that the logic in prepare_load.py is different from dataset.py. prepare_load didn't filter the data which len(input_ids) < chunk_size like
mengzi-retrieval-lm/train/dataset.py
Line 43 in 9e370ee