EleutherAI / the-pile

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paper checklist

leogao2 opened this issue · comments

A checklist for things we need to get done on the paper, prioritized.

Must do:

  • Train 6B on Pile and report Perplexity (@sdtblck )
  • Datasheets (@StellaAthena )
  • Set up webpage (see #69)
  • Write up announcement blog post (@leogao2 )
  • Transfer Pile to The Eye (@leogao2 )
  • Implement in HF transformers (@leogao2 )
  • Finish writing up the paper (everyone)

Nice to have:

  • Perform profanity analysis (@anishthite )
  • Perform language analysis (@leogao2 )
  • Perform topic analysis (@cfoster0 )
  • Perform n-gram analysis (@researcher2 )
  • Report GPT-3 (and other pretrained models) Pile validation Perplexity (@zphang )
  • Perform other analyses we think of
  • Train 1.5B on Pile and report Pile validation Perplexity (@sdtblck / @anishthite )
  • Train 117M on Pile and report Pile validation Perplexity (@sdtblck / @anishthite )
  • Report evaluation score of {6B, 1.5B, 117M} trained on Pile on as many evaluations as possible (implemented in lm_eval_harness)
  • Design a logo for Pile

Wishlist:

  • Train 6B on CC and report Pile validation Perplexity (@sdtblck )
  • Train 1.5B on CC and report Pile validation Perplexity (@sdtblck / @anishthite )
  • Train 117M on CC and report Pile validation Perplexity (@sdtblck / @anishthite )
  • Report evaluation score of {6B, 1.5B, 117M} trained on CC on as many evaluations as possible (implemented in lm_eval_harness)