Chris Ha's repositories
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2/V1, MNASNet, Single-Path NAS, FBNet, and more
BigLittleNet
Official repository for Big-Little Net
CMT_CNN-meet-Vision-Transformer
A PyTorch implementation of CMT based on paper CMT: Convolutional Neural Networks Meet Vision Transformers.
data_tooling
Tools for managing datasets for governance and training.
datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
dps
Data processing system for polyglot
fast-counter
Faster concurrent atomic number updates
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
open-lid-dataset
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., upcoming)
oscar-tools
The original tooling for the OSCAR corpus rewritten in Rust
oscar-website
The website of the Oscar Project
PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
peS2o
Pretraining Efficiently on S2ORC!
pii-transform
Perform transformations on PII instances detected in documents
rust-bloom-filter
A fast Bloom filter implementation in Rust
rust-github-demo
This is for demoing features of GitHub
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
tlsh
xxh enhanced version of Rust port of TLSH
ungoliant
:spider: The pipeline for the OSCAR corpus
VoCapXLM
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"
warc-specifications
Centralised repository for WARC usage specifications.
xxhash-c-sys
Rust raw bindings to xxHash