Genghan Zhang's repositories
CS224N-Spring2024-DFP-Student-Handout
Starter Code for Default Final Project, Spring 2024
ByteEngine
An LLM engine based on ByteTransformer.
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
ChatGLM-X
ChatGLM with xformers
compiler-and-arch
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
ctf
Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
cutlass
CUDA Templates for Linear Algebra Subroutines
dejavu_profile
Profiling of Deja Vu kernels
FasterTransformer
Transformer related optimization, including BERT, GPT
FlameGraph
Stack trace visualizer
flash-attention
Fast and memory-efficient exact attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
GLM-demo
Codebase for ChatGLM-6B demo.
googletest
GoogleTest - Google Testing and Mocking Framework
MyPicBed
This is my picbed
pyllama
LLaMA: Open and Efficient Foundation Language Models
splatt
The Surprisingly ParalleL spArse Tensor Toolkit.
taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
thuthesis
LaTeX Thesis Template for Tsinghua University
tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
Welder
OSDI 2023 Welder, deeplearning compiler
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
zhang677.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics