pprp

Peyton's repositories

SimpleCVPaperReading

:smile:博客论文列表：分系列整理

Language:JavaScript384 30

PicoNAS

Modularized NAS Framework

Language:PythonGPL-3.07 1 7

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

MIT200

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0100

BitDistiller

A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Language:PythonMIT000

corenet

CoreNet: A library for training deep neural networks

NOASSERTION000

DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Apache-2.0000

evolutionary-model-merge

Official repository of Evolutionary Optimization of Model Merging Recipes

Apache-2.0000

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT000

Firefly

Firefly: 大模型训练工具，支持训练Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

000

JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Apache-2.0000

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

000

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

000