Beast code in Giters

pengcuo's repositories

onnx-simplifier

Simplify your onnx model

Language:C++Apache-2.0100

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0000

ChatPaper

Use ChatGPT to summarize the arXiv papers.

Language:PythonNOASSERTION000

ColossalAI

Making big AI models cheaper, easier, and scalable

Language:PythonApache-2.0000

dbg-macro

A dbg(…) macro for C++

Language:C++MIT000

excelPanel

An Android's two-dimensional RecyclerView. Not only can load historical data, but also can load future data.

Language:JavaApache-2.0000

FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

NOASSERTION000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

Llama-Chinese

Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用

Language:Python000

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

Language:Jupyter NotebookNOASSERTION000

llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

000

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonMIT000

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Apache-2.0000

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

000

MQBench

Model Quantization Benchmark

Language:ShellApache-2.0000

namegpt

Generate unique and creative project names in seconds with AI!

Language:TypeScriptMIT000

onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

Language:JavaScriptMIT000

ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language:PythonApache-2.0000

prajna

a program language for AI infrastructure

Language:C++NOASSERTION000

PyTorch_YOLOv1

A new version of YOLOv1

Language:Python000

PyTorch_YOLOv3

Language:Python000

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Apache-2.0000

ShiArthur03

GPL-3.0000

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

Language:C++Apache-2.0000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Apache-2.0000

triton

Development repository for the Triton language and compiler

MIT000

United-Perception

United Perception

Language:PythonApache-2.0000

unsloth

Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.0000

pengcuo

pengcuo's repositories

onnx-simplifier

aqt

Awesome-LLM-Inference

ChatPaper

ColossalAI

dbg-macro

deepseekv2-profile

excelPanel

FlagAttention

flash-attention

Llama-Chinese

llama-recipes

llm_interview_note

LoRA

marlin

Mooncake

MQBench

namegpt

onnx-modifier

ppq

prajna

PyTorch_YOLOv1

PyTorch_YOLOv3

sglang

ShiArthur03

TensorRT

transformers

triton

United-Perception

unsloth