Taowen (Tony)'s repositories
ICHaskellStyleGuide
A Haskell style guide that follows conventions in Imperial College 40009 Computing Practical.
bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
cpufp
A CPU tool for benchmarking the peak of floating points
dotfiles
my dotfiles
FasterTransformer
Transformer related optimization, including BERT, GPT
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
LLM_Tree_Search
The official implementation of paper: Alphazero-like Tree-Search can guide large language model decoding and training
Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
omnisafe
OmniSafe is an infrastructural framework for accelerating SafeRL research.
paper_reading
A shared paper reading repository for people in the group
please
a command line copilot
Retriever
Retriever-0.1B
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs