The load_docs_from_filepath method in src/task/pretrain_roberta/train.py just return empty list.
HiroshigeAoki opened this issue · comments
Hiroshige Aoki commented
The load_docs_from_filepath method in src/task/pretrain_roberta/train.py only return empty list.
Is it intended behavior?
Thank you.
def load_docs_from_filepath(filepath, tokenizer):
docs = []
with open(filepath, encoding="utf-8") as f:
doc = []
for line in f:
line = line.strip()
if line == "":
if len(doc) > 0:
docs.append(doc)
doc = []
else:
sent = line
tokens = tokenizer.tokenize(sent)
token_ids = tokenizer.convert_tokens_to_ids(tokens)
if len(token_ids) > 0:
doc.append(token_ids)
return docs
Zhao Tianyu commented
Hi, this is not supposed to happen.
Please check the content of the file at filepath
. If it is not empty, please paste some lines of it here so we can better understand what is happening.
Hiroshige Aoki commented
This was my fault. I'm sorry to interrupt you...