BigScience Workshop (bigscience-workshop)

BigScience Workshop

bigscience-workshop

Geek Repo

Research workshop on large language models - The Summer of Language Models 21

Home Page:https://bigscience.huggingface.co

Twitter:@BigScienceW

Github PK Tool:Github PK Tool

BigScience Workshop's repositories

petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Language:PythonLicense:MITStargazers:8699Issues:87Issues:186

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonLicense:Apache-2.0Stargazers:2510Issues:30Issues:162

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1244Issues:24Issues:143

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Language:ShellLicense:NOASSERTIONStargazers:939Issues:36Issues:19

xmtf

Crosslingual Generalization through Multitask Finetuning

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:495Issues:5Issues:22

t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

Language:PythonLicense:Apache-2.0Stargazers:448Issues:24Issues:21

biomedical

Tools for curating biomedical training data for large-scale language modeling

data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:279Issues:24Issues:12

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonLicense:MITStargazers:91Issues:4Issues:26

lam

Libraries, Archives and Museums (LAM)

data_tooling

Tools for managing datasets for governance and training.

Language:HTMLLicense:Apache-2.0Stargazers:74Issues:16Issues:261

multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language

Language:PythonLicense:Apache-2.0Stargazers:65Issues:16Issues:24

evaluation

Code and Data for Evaluation WG

Language:PythonLicense:NOASSERTIONStargazers:41Issues:23Issues:51

metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.

Language:PythonLicense:Apache-2.0Stargazers:30Issues:18Issues:57
License:Apache-2.0Stargazers:23Issues:2Issues:0
Language:PythonLicense:Apache-2.0Stargazers:11Issues:16Issues:1

carbon-footprint

A repository for `codecarbon` logs.

Language:Jupyter NotebookStargazers:10Issues:14Issues:1

bloom-dechonk

A repo for running model shrinking experiments

Language:PythonStargazers:8Issues:5Issues:0

catalogue_data

Scripts to prepare catalogue data

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8Issues:21Issues:5

historical_texts

BigScience working group on language models for historical texts

Language:Jupyter NotebookStargazers:8Issues:24Issues:0

pii_processing

PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon

Language:PythonLicense:NOASSERTIONStargazers:8Issues:15Issues:7

massive-probing-framework

Framework for BLOOM probing

Language:PythonStargazers:7Issues:1Issues:0

transformers

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:5Issues:2Issues:0

bibliography

A list of BigScience publications

Language:TeXLicense:Apache-2.0Stargazers:3Issues:1Issues:2

datasets_stats

Generate statistics over datasets used in the context of BS

Language:MakefileStargazers:2Issues:23Issues:0

evaluation-robustness-consistency

Tools for evaluating model robustness and consistency

Language:PythonLicense:NOASSERTIONStargazers:2Issues:19Issues:0
Language:PythonLicense:Apache-2.0Stargazers:1Issues:1Issues:0