Fei Hu's repositories
FasterTransformer
Transformer related optimization, including BERT, GPT
tensorflow
Computation using data flow graphs for scalable machine learning
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
awesome-courses
:books: List of awesome university courses for learning Computer Science!
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
feihugis.github.io
Fei Hu's Blog
flash-attention
Fast and memory-efficient exact attention
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
GPTQ-triton
GPTQ inference Triton kernel
graph-learn
graph-learn
hardware-effects
Demonstration of various hardware effects.
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
photoprism
Personal Photo Management powered by Go and Google TensorFlow
text-to-text-transfer-transformer
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
triton-adsbrain-backend
Common source, scripts and utilities for creating Triton backends.
triton-server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
TurboTransformers
a fast and user-friendly tool for transformer inference on CPU and GPU