sirius93123's repositories
accel-sim-framework
This is the top-level repository for the Accel-Sim framework.
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Atom
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
Awesome-Video-Diffusion-Models
[Arxiv] A Survey on Video Diffusion Models
baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
cpplinks
A categorized list of C++ resources.
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
DecryptPrompt
总结Prompt&LLM论文,开源数据&模型,AIGC应用
Efficient-LLMs-Survey
Efficient Large Language Models: A Survey
FinGPT
Data-Centric FinGPT. Open-source for open finance! Revolutionize 🔥 We release the trained model on HuggingFace.
FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
mlir-tutorial-ch
Hands-On Practical MLIR Tutorial
nvbench
CUDA Kernel Benchmarking Library
torchqtm
TorchQuantum is a backtesting framework that integrates the structure of PyTorch and WorldQuant's Operator for efficient quantitative financial analysis.
trident
A performance library for machine learning applications.
tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
tvm_gpu_gemm
play gemm with tvm