Giters
bigscience-workshop
/
catalogue_data
Scripts to prepare catalogue data
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
8
Watchers:
21
Issues:
5
Forks:
1
bigscience-workshop/catalogue_data Issues
S2ORC vs Arxiv vs PMC
Updated
2 years ago
Comments count
6
Wiki-based dataset cleaning
Updated
2 years ago
Comments count
7
Catching crawling noise + ads
Updated
2 years ago
Comments count
6
Repeated lines across examples
Updated
2 years ago
Comments count
3
Removing dataset lm_en_a_million_news_headlines_abc_australia
Updated
2 years ago
Comments count
2