google-research / FLAN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] get_dataset for flan2021_submix taking hours, while flan_zsnoopt takes 5m

TheExGenesis opened this issue · comments

Not much to add except to say that flan_zsnoopt is gotten quite quickly, whereas flan2021_submix is going on 3h and and 22GB RAM with no sign of stopping

@TheExGenesis I suspect this is because the few-shot submixtures are much longer to generate.

@SirNeural is one person I know who has generated all the tasks in this pipeline. Maybe they have an estimate of the times? (I have only generated smaller versions with this external code version -- with a lot of the datasets commented out. So I don't have end-to-end wall clock.)