Beast code in Giters

sirius93123's repositories

accel-sim-framework

This is the top-level repository for the Accel-Sim framework.

Language:PythonNOASSERTION000

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonApache-2.0000

Atom

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

000

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

000

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0000

Awesome-LLM-System-Papers

000

Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

MIT000

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Apache-2.0000

Awesome-Video-Diffusion-Models

[Arxiv] A Survey on Video Diffusion Models

000

baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Apache-2.0000

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

MIT000

CircuitOps

Apache-2.0000

cpplinks

A categorized list of C++ resources.

000

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

MIT000

cutlass-kernels

MIT000

DecryptPrompt

总结Prompt&LLM论文，开源数据&模型，AIGC应用

000

DejaVu

000

Efficient-LLMs-Survey

Efficient Large Language Models: A Survey

Apache-2.0000

FinGPT

Data-Centric FinGPT. Open-source for open finance! Revolutionize 🔥 We release the trained model on HuggingFace.

MIT000

FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Apache-2.0000

LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

000

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Apache-2.0000

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Apache-2.0000

mlir-tutorial-ch

Hands-On Practical MLIR Tutorial

Apache-2.0000

nvbench

CUDA Kernel Benchmarking Library

Apache-2.0000

torchqtm

TorchQuantum is a backtesting framework that integrates the structure of PyTorch and WorldQuant's Operator for efficient quantitative financial analysis.

MIT000

trident

A performance library for machine learning applications.

Apache-2.0000

tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

MIT000

tvm_gpu_gemm

play gemm with tvm

000

TVMGemmAsync

000