feihugis

Fei Hu's repositories

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.01 10

tensorflow

Computation using data flow graphs for scalable machine learning

Language:C++Apache-2.0020

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT000

awesome-courses

:books: List of awesome university courses for learning Computer Science!

010

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaMIT000

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++BSD-3-Clause010

dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Language:PythonApache-2.0020

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT020

fastseq

Language:PythonMIT010

fastseq-1

An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks.

Language:PythonMIT010

feihugis.github.io

Fei Hu's Blog

Language:HTML020

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause000

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Language:PythonApache-2.0000

GPTQ-triton

GPTQ inference Triton kernel

Language:Jupyter NotebookApache-2.0000

graph-learn

Language:C++Apache-2.0010

hardware-effects

Demonstration of various hardware effects.

Language:C++MIT000

mesh

Mesh TensorFlow: Model Parallelism Made Easier

Language:PythonApache-2.0020

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Language:C++MIT010

open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

Language:CNOASSERTION000

photoprism

Personal Photo Management powered by Go and Google TensorFlow

Language:GoAGPL-3.0010

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++NOASSERTION010

runtime

A performant and modular runtime for TensorFlow

Language:MLIRApache-2.0010

SPSCQueue

A bounded single-producer single-consumer wait-free and lock-free queue written in C++11

Language:C++MIT010

tensorflow_notes

020

text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Language:PythonApache-2.0020

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.0020

triton

Development repository for the Triton language and compiler

Language:C++MIT010

triton-adsbrain-backend

Common source, scripts and utilities for creating Triton backends.

Language:C++BSD-3-Clause000

triton-server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonBSD-3-Clause000

TurboTransformers

a fast and user-friendly tool for transformer inference on CPU and GPU

Language:C++NOASSERTION010