yi du's starred repositories
AI_for_Science_paper_collection
List the AI for Science papers accepted by top conferences
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Awesome-Scientific-Language-Models
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
ResponsibleNLP
Repository for research in the field of Responsible NLP at Meta.
awesome-fairness-papers
Papers on fairness in NLP
Chemical-Data-Download
Download Dataset (MP, OQMD, AFLOW, JARVIS etc.) using Matminer, Restful API and AFLUX
paperswithcode-client
API Client for paperswithcode.com
Reduced_Reused_Recycled
Github for "Reduced, Reused and Recycled" (NeurIPS 2021 Best Paper, D&B Track)
Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
awesome-active-learning
A curated list of awesome Active Learning
open-images-dataset
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
datacardsplaybook
The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.
broad_twitter_corpus
The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)
promptsource
Toolkit for creating, sharing and using natural language prompts.
s2orc-doc2json
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)