sirius93123's repositories
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Efficient-LLMs-Survey
Efficient Large Language Models: A Survey
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Awesome-Video-Diffusion-Models
[Arxiv] A Survey on Video Diffusion Models
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Atom
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
cpplinks
A categorized list of C++ resources.
accel-sim-framework
This is the top-level repository for the Accel-Sim framework.
nvbench
CUDA Kernel Benchmarking Library
Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
mlir-tutorial-ch
Hands-On Practical MLIR Tutorial
trident
A performance library for machine learning applications.
FinGPT
Data-Centric FinGPT. Open-source for open finance! Revolutionize 🔥 We release the trained model on HuggingFace.
DecryptPrompt
总结Prompt&LLM论文,开源数据&模型,AIGC应用
mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
tvm_gpu_gemm
play gemm with tvm
torchqtm
TorchQuantum is a backtesting framework that integrates the structure of PyTorch and WorldQuant's Operator for efficient quantitative financial analysis.
baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.