salesforce / CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BigQuery dataset

ValeKnappich opened this issue · comments

Hi, first of all, great work!

Is there any chance you could provide more details on the BigQuery dataset / subset? Perhaps a list of the repositories used?
It would be great to have in order to avoid data leakage in experiments.

Cheers

Hi, Could you please provide this information? Or at least the time frame for the collected dataset.