Unable to preprocess data for summarization
samiksome92 opened this issue · comments
I followed these instructions:
git clone https://github.com/dropreg/R-Drop.git
cd R-Drop/fairseq_src/
pip install --editable .
and tried to preprocess the data for summarization by running,
bash script/preprocess.sh
However, I get the following error:
/users/gpu/samiks/anaconda3/envs/rdrop/bin/python: No module named examples.roberta.multiprocessing_bpe_encoder
It seems multiprocessing_bpe_encoder is missing from this repo. Are we supposed to run the preprocessing with a separate fairseq install?
I reinstalled FairSeq and apex, this problem can be fixed.
The file "multiprocessing_bpe_encoder.py" is fairseq roberta scripts:
wget https://raw.githubusercontent.com/pytorch/fairseq/main/examples/roberta/multiprocessing_bpe_encoder.py