The load_docs_from_filepath method in src/task/pretrain_roberta/train.py just return empty list.

Question

The load_docs_from_filepath method in src/task/pretrain_roberta/train.py just return empty list.

HiroshigeAoki opened this issue 3 years ago · comments

The load_docs_from_filepath method in src/task/pretrain_roberta/train.py only return empty list.
Is it intended behavior?
Thank you.

def load_docs_from_filepath(filepath, tokenizer):
    docs = []
    with open(filepath, encoding="utf-8") as f:
        doc = []
        for line in f:
            line = line.strip()
            if line == "":
                if len(doc) > 0:
                    docs.append(doc)
                doc = []
            else:
                sent = line
                tokens = tokenizer.tokenize(sent)
                token_ids = tokenizer.convert_tokens_to_ids(tokens)
                if len(token_ids) > 0:
                    doc.append(token_ids)
    return docs

Zhao Tianyu · Answer 1 · Thu Nov 04 2021 09:03:55 GMT+0800 (China Standard Time)

Hi, this is not supposed to happen.
Please check the content of the file at filepath. If it is not empty, please paste some lines of it here so we can better understand what is happening.

Hiroshige Aoki · Answer 2 · Thu Nov 04 2021 15:53:03 GMT+0800 (China Standard Time)

This was my fault. I'm sorry to interrupt you...