shjwudp

Jianbin Chang's repositories

shu

中文书籍收录整理, Collection of Chinese Books

Language:PythonMIT146 5 3

c4-dataset-script

Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.

Language:PythonMIT106 50

A PyTorch implementation of MEGABYTE. This multi-scale transformer architecture has the excellent features of tokenization-free and sub-quadratic attention. The paper link: https://arxiv.org/abs/2305.07185

Language:PythonMIT2 20

blueprint-trainer

Scaffolding for sequence model training research.

Language:PythonMIT1 20

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonBSD-3-Clause000

bagua-core

Core communication lib for Bagua.

Language:RustMIT010

BLOOM-COT

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION000

ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI

Language:PythonApache-2.0000

conversational-datasets

Language:PythonMIT010

do-we-need-attention

Language:TeXMIT000

GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model

Language:PythonApache-2.0000

gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Language:PythonApache-2.0000

GPU-math

🤯 GPU math & benchmarks, branched from mli / transformers-benchmarks

Language:Jupyter NotebookApache-2.0000

Huggingface-Model-Service

Language:PythonMIT010

hyena-jax

JAX/Flax implementation of the Hyena Hierarchy

Language:Jupyter NotebookMIT000

juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Language:GoApache-2.0000

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

MIT000

Megatron-LM

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonNOASSERTION010

NeMo

NeMo: a toolkit for conversational AI

Language:PythonApache-2.0000

OptimalShardedDataParallel

An automated parallel training system that combines the advantages from both data and model parallelism. If you have any interests, please visit/star/fork https://github.com/Youhe-Jiang/OptimalShardedDataParallel

Language:Python000

RWKV-LM

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.0000

shjwudp

Jianbin Chang's repositories

shu

c4-dataset-script

mamba-jax

megabyte

blueprint-trainer

apex

bagua-core

BLOOM-COT

ColossalAI-Examples

conversational-datasets

do-we-need-attention

GLM-130B

gpt-neox

GPU-math

Huggingface-Model-Service

hyena-jax

juicefs

MEGABYTE-pytorch

Megatron-LM

NeMo

OptimalShardedDataParallel

RWKV-LM

S5

safari

shjwudp.github.io

TimeChamber

tinygrad

Titans

transformers

twitter-dialogue