yi du's starred repositories

AI_for_Science_paper_collection

List the AI for Science papers accepted by top conferences

Language:PythonLicense:Apache-2.0Stargazers:40Issues:0Issues:0
Language:HTMLLicense:Apache-2.0Stargazers:3Issues:0Issues:0

AIRS

Artificial Intelligence Research for Science (AIRS)

Language:PythonLicense:GPL-3.0Stargazers:469Issues:0Issues:0

science

https://mlcommons.org/en/groups/research-science/

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:10Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:35994Issues:0Issues:0

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonLicense:NOASSERTIONStargazers:626Issues:0Issues:0

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonLicense:MITStargazers:3494Issues:0Issues:0

Awesome-Scientific-Language-Models

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

License:MITStargazers:398Issues:0Issues:0

ResponsibleNLP

Repository for research in the field of Responsible NLP at Meta.

Language:PythonLicense:NOASSERTIONStargazers:179Issues:0Issues:0

awesome-fairness-papers

Papers on fairness in NLP

Stargazers:419Issues:0Issues:0

nomad

NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.

Language:JavaScriptLicense:Apache-2.0Stargazers:66Issues:0Issues:0

Chemical-Data-Download

Download Dataset (MP, OQMD, AFLOW, JARVIS etc.) using Matminer, Restful API and AFLUX

Language:Jupyter NotebookLicense:MITStargazers:5Issues:0Issues:0

fairchem

FAIR Chemistry's library of machine learning methods for chemistry

Language:PythonLicense:NOASSERTIONStargazers:737Issues:0Issues:0

paperswithcode-client

API Client for paperswithcode.com

Language:PythonLicense:Apache-2.0Stargazers:142Issues:0Issues:0

Reduced_Reused_Recycled

Github for "Reduced, Reused and Recycled" (NeurIPS 2021 Best Paper, D&B Track)

Language:Jupyter NotebookStargazers:15Issues:0Issues:0

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

License:Apache-2.0Stargazers:759Issues:0Issues:0

OpenAGI

OpenAGI: When LLM Meets Domain Experts

Language:PythonLicense:MITStargazers:1871Issues:0Issues:0

TaiSu

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)

Language:PythonLicense:NOASSERTIONStargazers:172Issues:0Issues:0

awesome-active-learning

A curated list of awesome Active Learning

License:CC0-1.0Stargazers:697Issues:0Issues:0

open-images-dataset

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

Stargazers:975Issues:0Issues:0

Blog

Python机器学习算法技术博客,有原创干货!有code实践! 【更多内容敬请关注公众号 "算法进阶"】

Language:Jupyter NotebookStargazers:753Issues:0Issues:0

datacardsplaybook

The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.

Language:TypeScriptLicense:Apache-2.0Stargazers:171Issues:0Issues:0

broad_twitter_corpus

The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:64Issues:0Issues:0

SciSciNet

A large-scale open data lake for the science of science research.

Language:Jupyter NotebookLicense:MITStargazers:55Issues:0Issues:0

TO-Scene

(ECCV 2022 Oral) TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes

Language:PythonLicense:MITStargazers:41Issues:0Issues:0

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:PythonStargazers:9727Issues:0Issues:0

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonLicense:Apache-2.0Stargazers:2613Issues:0Issues:0

s2orc-doc2json

Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)

Language:PythonLicense:Apache-2.0Stargazers:325Issues:0Issues:0

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonLicense:Apache-2.0Stargazers:11897Issues:0Issues:0

refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

Language:PythonLicense:Apache-2.0Stargazers:1382Issues:0Issues:0