BigScience Workshop (bigscience-workshop)

BigScience Workshop

bigscience-workshop

Geek Repo

Research workshop on large language models - The Summer of Language Models 21

Home Page:https://bigscience.huggingface.co

Twitter:@BigScienceW

Github PK Tool:Github PK Tool

ezoic increase your site revenue

BigScience Workshop's repositories

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonLicense:Apache-2.0Stargazers:684Issues:20Issues:141

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Language:ShellLicense:NOASSERTIONStargazers:308Issues:25Issues:5

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:153Issues:6Issues:80

t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

Language:PythonLicense:Apache-2.0Stargazers:151Issues:24Issues:11

biomedical

Tools for curating biomedical training data for large-scale language modeling

Language:PythonStargazers:135Issues:0Issues:0

evaluation

Code and Data for Evaluation WG

Language:PythonLicense:NOASSERTIONStargazers:36Issues:20Issues:51

data_tooling

Tools for managing datasets for governance and training.

Language:HTMLLicense:Apache-2.0Stargazers:32Issues:14Issues:261

data_sourcing

This directory gathers the tools developed by the Data Sourcing Working Group

Language:PythonLicense:Apache-2.0Stargazers:26Issues:16Issues:8

metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.

Language:PythonLicense:Apache-2.0Stargazers:16Issues:18Issues:57

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonLicense:MITStargazers:11Issues:0Issues:0

bigscience-workshop.github.io

Alternative to https://github.com/Dynalon/mdwiki-seed

Language:HTMLStargazers:10Issues:1Issues:0

historical_texts

BigScience working group on language models for historical texts

Language:Jupyter NotebookStargazers:7Issues:23Issues:0
License:Apache-2.0Stargazers:6Issues:0Issues:0

pii_processing

PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon

Language:PythonLicense:NOASSERTIONStargazers:5Issues:15Issues:7
Language:PythonLicense:Apache-2.0Stargazers:5Issues:16Issues:0

carbon-footprint

A repository for `codecarbon` logs.

Language:Jupyter NotebookStargazers:4Issues:15Issues:1

data-preparation

Code used for sourcing and cleaning the BigScience corpus

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:4Issues:1Issues:0

bloom-dechonk

A repo for running model shrinking experiments

Language:PythonStargazers:3Issues:0Issues:0

catalogue_data

Scripts to prepare catalogue data

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3Issues:20Issues:5

scaling-laws-tokenization

scaling-laws-tokenization

License:Apache-2.0Stargazers:2Issues:15Issues:0

datasets_stats

Generate statistics over datasets used in the context of BS

Language:MakefileStargazers:1Issues:22Issues:0

evaluation-robustness-consistency

Tools for evaluating model robustness and consistency

Language:PythonLicense:NOASSERTIONStargazers:1Issues:19Issues:0

amazon-sagemaker-mlflow-fargate

Managing your machine learning lifecycle with MLflow and Amazon SageMaker

Language:Jupyter NotebookLicense:MIT-0Stargazers:0Issues:0Issues:0

codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

lam

Libraries, Archives and Museums (LAM)

License:Apache-2.0Stargazers:0Issues:0Issues:0