Peyton's repositories
SimpleCVPaperReading
:smile:博客论文列表:分系列整理
Awesome-LLM-Prune
Awesome list for LLM pruning.
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Awesome-Mamba-Papers
Awesome Papers related to Mamba.
BitDistiller
A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
corenet
CoreNet: A library for training deep neural networks
DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
Firefly
Firefly: 大模型训练工具,支持训练Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
llama-recipes
Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger
llama.cpp
LLM inference in C/C++
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
llm-kick
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
pprp.github.io
Personal Academic Page for pprp
pykan
Kolmogorov Arnold Networks
qllm-eval
Code Repository of Evaluating Quantized Large Language Models
quanto
A pytorch Quantization Toolkit
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
unsloth
Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory