Chien Nguyen's starred repositories
flash-attention
Fast and memory-efficient exact attention
readme-md-generator
📄 CLI that generates beautiful README.md files
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
fsdp_qlora
Training LLMs with QLoRA + FSDP
llm-autoeval
Automatically evaluate your LLMs in Google Colab
TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
KnowledgeEditor
Code for Editing Factual Knowledge in Language Models
Everything-of-Thoughts-XoT
An implemtation of Everyting of Thoughts (XoT).
DiffusionNER
Code for the paper "DiffusionNER: Boundary Diffusion for Named Entity Recognition", accepted at ACL 2023.
Rainbow-Table
Group Project for UO CS631 (Advanced Parallel Computing)