sirius93123

sirius93123

Geek Repo

Github PK Tool:Github PK Tool

sirius93123's repositories

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

License:MITStargazers:0Issues:0Issues:0

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

License:Apache-2.0Stargazers:0Issues:0Issues:0

FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

License:Apache-2.0Stargazers:0Issues:0Issues:0

tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

License:MITStargazers:0Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:0Issues:0Issues:0

Efficient-LLMs-Survey

Efficient Large Language Models: A Survey

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Stargazers:0Issues:0Issues:0

Awesome-Video-Diffusion-Models

[Arxiv] A Survey on Video Diffusion Models

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

License:Apache-2.0Stargazers:0Issues:0Issues:0

Atom

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Stargazers:0Issues:0Issues:0

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

cpplinks

A categorized list of C++ resources.

Stargazers:0Issues:0Issues:0

accel-sim-framework

This is the top-level repository for the Accel-Sim framework.

License:NOASSERTIONStargazers:0Issues:0Issues:0

nvbench

CUDA Kernel Benchmarking Library

License:Apache-2.0Stargazers:0Issues:0Issues:0

Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

License:MITStargazers:0Issues:0Issues:0

mlir-tutorial-ch

Hands-On Practical MLIR Tutorial

License:Apache-2.0Stargazers:0Issues:0Issues:0

trident

A performance library for machine learning applications.

License:Apache-2.0Stargazers:0Issues:0Issues:0

FinGPT

Data-Centric FinGPT. Open-source for open finance! Revolutionize 🔥 We release the trained model on HuggingFace.

License:MITStargazers:0Issues:0Issues:0

DecryptPrompt

总结Prompt&LLM论文,开源数据&模型,AIGC应用

Stargazers:0Issues:0Issues:0

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

License:Apache-2.0Stargazers:0Issues:0Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

License:MITStargazers:0Issues:0Issues:0

LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

tvm_gpu_gemm

play gemm with tvm

Stargazers:0Issues:0Issues:0

torchqtm

TorchQuantum is a backtesting framework that integrates the structure of PyTorch and WorldQuant's Operator for efficient quantitative financial analysis.

License:MITStargazers:0Issues:0Issues:0

baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

License:Apache-2.0Stargazers:0Issues:0Issues:0