ZZK's repositories
open-resume
OpenResume is a powerful open-source resume builder and resume parser. https://open-resume.com/
tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
CUDALibrarySamples
CUDA Library Samples
CV-CUDA
CV-CUDA™ is an open-source, graphics processing unit (GPU)-accelerated library for cloud-scale image processing and computer vision.
docs
Documentations for PaddlePaddle
dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
EdgeGPT
Reverse engineered API of Microsoft's Bing Chat AI
FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.
Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
GPTQ-triton
GPTQ inference Triton kernel
InferLLM
a lightweight LLM model inference framework
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
LLMSurvey
A collection of papers and resources related to Large Language Models.
matxscript
The model pre-processing and post-processing framework
nccl-tests
NCCL Tests
PTX-ISA
CUDA PTX-ISA Document 中文翻译版
QuickMathHPP
a single-header math library
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
taichi-nerfs
Implementations of NeRF variants based on Taichi + PyTorch
typst
A new markup-based typesetting system that is powerful and easy to learn.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs