BigScience Workshop (bigscience-workshop)

BigScience Workshop

bigscience-workshop

Geek Repo

Research workshop on large language models - The Summer of Language Models 21

Home Page:https://bigscience.huggingface.co

Twitter:@BigScienceW

Github PK Tool:Github PK Tool

BigScience Workshop's repositories

petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Language:PythonLicense:MITStargazers:9089Issues:91Issues:200

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonLicense:Apache-2.0Stargazers:2644Issues:32Issues:162

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1317Issues:24Issues:144

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Language:ShellLicense:NOASSERTIONStargazers:973Issues:38Issues:19

xmtf

Crosslingual Generalization through Multitask Finetuning

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:510Issues:6Issues:22

t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

Language:PythonLicense:Apache-2.0Stargazers:456Issues:24Issues:21

biomedical

Tools for curating biomedical training data for large-scale language modeling

data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:299Issues:24Issues:12

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonLicense:MITStargazers:98Issues:4Issues:26

lam

Libraries, Archives and Museums (LAM)

data_tooling

Tools for managing datasets for governance and training.

Language:HTMLLicense:Apache-2.0Stargazers:77Issues:16Issues:261

multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language

Language:PythonLicense:Apache-2.0Stargazers:69Issues:16Issues:24

evaluation

Code and Data for Evaluation WG

Language:PythonLicense:NOASSERTIONStargazers:41Issues:23Issues:51

metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.

Language:PythonLicense:Apache-2.0Stargazers:30Issues:18Issues:57
License:Apache-2.0Stargazers:24Issues:2Issues:0
Language:PythonLicense:Apache-2.0Stargazers:11Issues:16Issues:1

bloom-dechonk

A repo for running model shrinking experiments

Language:PythonStargazers:10Issues:6Issues:0

carbon-footprint

A repository for `codecarbon` logs.

Language:Jupyter NotebookStargazers:10Issues:14Issues:1

catalogue_data

Scripts to prepare catalogue data

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8Issues:21Issues:5

historical_texts

BigScience working group on language models for historical texts

Language:Jupyter NotebookStargazers:8Issues:24Issues:0

massive-probing-framework

Framework for BLOOM probing

Language:PythonStargazers:8Issues:2Issues:0

pii_processing

PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon

Language:PythonLicense:NOASSERTIONStargazers:8Issues:15Issues:7

bibliography

A list of BigScience publications

Language:TeXLicense:Apache-2.0Stargazers:3Issues:1Issues:2

datasets_stats

Generate statistics over datasets used in the context of BS

Language:MakefileStargazers:2Issues:23Issues:0

evaluation-robustness-consistency

Tools for evaluating model robustness and consistency

Language:PythonLicense:NOASSERTIONStargazers:2Issues:19Issues:0
Language:PythonLicense:Apache-2.0Stargazers:2Issues:1Issues:0

ShadesofBias

Evaluation for Shades of Bias in Text

Language:Jupyter NotebookStargazers:0Issues:0Issues:0