yi du's starred repositories

science

https://mlcommons.org/en/groups/research-science/

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:10Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:35652Issues:0Issues:0

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonLicense:NOASSERTIONStargazers:600Issues:0Issues:0

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonLicense:MITStargazers:3441Issues:0Issues:0

Awesome-Scientific-Language-Models

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

License:MITStargazers:385Issues:0Issues:0

ResponsibleNLP

Repository for research in the field of Responsible NLP at Meta.

Language:PythonLicense:NOASSERTIONStargazers:176Issues:0Issues:0

awesome-fairness-papers

Papers on fairness in NLP

Stargazers:418Issues:0Issues:0

nomad

NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.

Language:JavaScriptLicense:Apache-2.0Stargazers:64Issues:0Issues:0

Chemical-Data-Download

Download Dataset (MP, OQMD, AFLOW, JARVIS etc.) using Matminer, Restful API and AFLUX

Language:Jupyter NotebookLicense:MITStargazers:5Issues:0Issues:0

fairchem

FAIR Chemistry's library of machine learning methods for chemistry

Language:PythonLicense:NOASSERTIONStargazers:712Issues:0Issues:0

paperswithcode-client

API Client for paperswithcode.com

Language:PythonLicense:Apache-2.0Stargazers:140Issues:0Issues:0

Reduced_Reused_Recycled

Github for "Reduced, Reused and Recycled" (NeurIPS 2021 Best Paper, D&B Track)

Language:Jupyter NotebookStargazers:15Issues:0Issues:0

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.

License:Apache-2.0Stargazers:720Issues:0Issues:0

OpenAGI

OpenAGI: When LLM Meets Domain Experts

Language:PythonLicense:MITStargazers:1826Issues:0Issues:0

TaiSu

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)

Language:PythonLicense:NOASSERTIONStargazers:171Issues:0Issues:0

awesome-active-learning

A curated list of awesome Active Learning

License:CC0-1.0Stargazers:690Issues:0Issues:0

open-images-dataset

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

Stargazers:965Issues:0Issues:0

Blog

Python机器学习算法技术博客,有原创干货!有code实践! 【更多内容敬请关注公众号 "算法进阶"】

Language:Jupyter NotebookStargazers:740Issues:0Issues:0

datacardsplaybook

The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation.

Language:TypeScriptLicense:Apache-2.0Stargazers:169Issues:0Issues:0

broad_twitter_corpus

The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:64Issues:0Issues:0

SciSciNet

A large-scale open data lake for the science of science research.

Language:Jupyter NotebookLicense:MITStargazers:51Issues:0Issues:0

TO-Scene

(ECCV 2022 Oral) TO-Scene: A Large-scale Dataset for Understanding 3D Tabletop Scenes

Language:PythonLicense:MITStargazers:41Issues:0Issues:0

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:PythonStargazers:9562Issues:0Issues:0

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonLicense:Apache-2.0Stargazers:2585Issues:0Issues:0

s2orc-doc2json

Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)

Language:PythonLicense:Apache-2.0Stargazers:320Issues:0Issues:0

MOSS

An open-source tool-augmented conversational language model from Fudan University

Language:PythonLicense:Apache-2.0Stargazers:11883Issues:0Issues:0

refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

Language:PythonLicense:Apache-2.0Stargazers:1377Issues:0Issues:0

BestPractices

Things that you should (and should not) do in your Materials Informatics research.

Language:Jupyter NotebookLicense:MITStargazers:163Issues:0Issues:0

Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调

Language:PythonLicense:Apache-2.0Stargazers:4401Issues:0Issues:0

the-algorithm

Source code for Twitter's Recommendation Algorithm

Language:ScalaLicense:AGPL-3.0Stargazers:61712Issues:0Issues:0