yi du's starred repositories
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Awesome-Scientific-Language-Models
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
ResponsibleNLP
Repository for research in the field of Responsible NLP at Meta.
awesome-fairness-papers
Papers on fairness in NLP
Chemical-Data-Download
Download Dataset (MP, OQMD, AFLOW, JARVIS etc.) using Matminer, Restful API and AFLUX
paperswithcode-client
API Client for paperswithcode.com
Reduced_Reused_Recycled
Github for "Reduced, Reused and Recycled" (NeurIPS 2021 Best Paper, D&B Track)
Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
awesome-active-learning
A curated list of awesome Active Learning
open-images-dataset
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
datacardsplaybook
The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.
broad_twitter_corpus
The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)
promptsource
Toolkit for creating, sharing and using natural language prompts.
s2orc-doc2json
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
BestPractices
Things that you should (and should not) do in your Materials Informatics research.
Huatuo-Llama-Med-Chinese
Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
the-algorithm
Source code for Twitter's Recommendation Algorithm