EleutherAI / the-pile

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tfds_pile

everks opened this issue · comments

I manually download the pile dataset and try to use pile_tfds.py to create tensorflow dataset, and find the _read_fn of PileReader only add text into result when type of text is list, but the actually format is str? so maybe result['text'] = text should be outside the if statement.

if isinstance(text, list):
        text = self.para_joiner.join(text)
        result['text'] = text