Beast code in Giters

zhangxs's starred repositories

llama.cpp

LLM inference in C/C++

Language:C++MIT61203 518 3322

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.022193 200 3300

llama2.c

Inference Llama 2 in one file of pure C

Language:CMIT16741 188 216

tvm_mlir_learn

compiler learning resources collect.

Language:Python1928 35 4

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.01880 67 4

MambaOut

MambaOut: Do We Really Need Mamba for Vision?

Language:PythonApache-2.01868 6 240

ppl.nn

A primitive library for neural network

Language:C++Apache-2.01239 36 113

LookaheadDecoding

Language:PythonApache-2.01034 11 55

tph-yolov5

Language:PythonGPL-3.0702 7 68

InferLLM

a lightweight LLM model inference framework

Language:C++Apache-2.0656 10 54

Conformer

Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition

Language:Jupyter NotebookApache-2.0520 6 38

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Language:PythonMIT490 8 16

rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Language:C++Apache-2.0434 11 70

TinySAM

Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"

Language:PythonApache-2.0372 13 24

HorNet

[NeurIPS 2022] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Language:PythonMIT312 5 37

SlimSAM

SlimSAM: 0.1% Data Makes Segment Anything Slim

Language:PythonApache-2.0248 7 19

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:Python231 14 10

inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

Language:C++MIT230 7 16

ViT-CoMer

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Language:PythonApache-2.0169 4 18

CAMixerSR

CAMixerSR: Only Details Need More “Attention” (CVPR 2024)

Language:PythonApache-2.0157 4 23

dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

Language:C++Apache-2.0102 4 16

PipeFusion

A Suite of Parallel Approaches for Inference of Diffusion Transformer Models on GPU Clusters

Language:PythonApache-2.093 1 16

Samba

Language:PythonApache-2.085 2 6

LW-DETR

This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".

Apache-2.077 6 2

Effective-Fusion-Factor

Effective Fusion Factor in FPN for Tiny Object Detection(WACV2021)

Language:PythonMIT59 2 3

u-mixformer

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Language:Jupyter NotebookApache-2.05300

ITER

PyTorch codes for "Iterative Token Evaluation and Refinement for Real-World Super-Resolution", AAAI 2024

Language:PythonNOASSERTION46 4 1

ICELUT

Taming Lookup Tables for Efficient Image Retouching

Language:Python1300

Hetu-Galvatron

Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).

Language:Python10 1 3

DHU-MMCT

Towards Effective Multi-Moving Camera Tracking: A New Dataset and Lightweight Link Model

Language:PythonNOASSERTION6 1 1