Jed Cheng's repositories
c4-dataset-script
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
bert-tokenizer-cantonese
BERT Tokenizer with vocabulary tailored for Cantonese
spin-torque-oscillators-reservoir-computing
Reservoir Computing for time series predictions with Spin Torque Oscillators
3
GPU-accelerated micromagnetic simulator
cantonese_langauge_model
Resources for my Cantonese language model research
FeeLLGood
FEM micromagnetic simulator
FOTS.PyTorch
FOTS Pytorch Implementation
Hierarchical-Localization
Visual localization made easy with hloc
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
write-with-gpt2
A simple text editor with suggestions generated from a Cantonese GPT2 model