pengcuo's repositories

onnx-simplifier

Simplify your onnx model

Language:C++License:Apache-2.0Stargazers:1Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:0Issues:0Issues:0

ChatPaper

Use ChatGPT to summarize the arXiv papers.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

ColossalAI

Making big AI models cheaper, easier, and scalable

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

dbg-macro

A dbg(…) macro for C++

Language:C++License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

excelPanel

An Android's two-dimensional RecyclerView. Not only can load historical data, but also can load future data.

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

License:NOASSERTIONStargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

Llama-Chinese

Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用

Language:PythonStargazers:0Issues:0Issues:0

llama-recipes

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

llm_interview_note

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

Stargazers:0Issues:0Issues:0

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

License:Apache-2.0Stargazers:0Issues:0Issues:0

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Stargazers:0Issues:0Issues:0

MQBench

Model Quantization Benchmark

Language:ShellLicense:Apache-2.0Stargazers:0Issues:0Issues:0

namegpt

Generate unique and creative project names in seconds with AI!

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0

onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

Language:JavaScriptLicense:MITStargazers:0Issues:0Issues:0

ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

prajna

a program language for AI infrastructure

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

PyTorch_YOLOv1

A new version of YOLOv1

Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:GPL-3.0Stargazers:0Issues:0Issues:0

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

License:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

License:MITStargazers:0Issues:0Issues:0

United-Perception

United Perception

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

unsloth

Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0