luyug / Condenser

EMNLP 2021 - Pre-training architectures for dense retrieval

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Whole word masking for RoBERTa

eugene-yang opened this issue · comments

Can you elaborate on why the first token is appended as an integer instead of [i] in line 65?
If the first word is being separated by BPE, this seems to be resulting in an uncaught exception for the following token.

cand_indexes.append(0)

What exception are you seeing?

It is complaining about an integer doesn't have method .append.

Right, that probably needs to be fixed.

Fixed at TOT.