Kaiqiang Song's repositories
multilingual-rouge
A multilingual rouge package (followed rouge_score) using BPE-tokenizer (from huggingface)
nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
AI-Paper-Collector
Fully-automated scripts for collecting AI-related papers
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
DeepSpeedExamples
Example models using DeepSpeed
flash-attention
Fast and memory-efficient exact attention
gitignore
A collection of useful .gitignore templates
gpt-crawler
Crawl a site to generate knowledge files to create your own custom GPT from a URL
knn-transformers
PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
LLaMA-Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
long-range-arena
Long Range Arena for Benchmarking Efficient Transformers
nlp-in-ling
Natural Language Processing Research in North American Linguistics Departments
NLPDataSet
记录本人整理的一些数据集
summarization-datasets
Pre-processing and in some cases downloading of datasets for the paper "Content Selection in Deep Learning Models of Summarization."
transformer-ls
Official implementation of Long-Short Transformer in PyTorch.
transformers-bloom-inference
Fast Inference Solutions for BLOOM