huismiling's repositories
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
bitsandbytes
8-bit CUDA functions for PyTorch
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
autogen
Enable Next-Gen Large Language Model Applications. Join our Discord: https://discord.gg/pAbnFJrkgZ
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
BitDistiller
A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
deepmd-kit
A deep learning package for many-body potential energy representation and molecular dynamics
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
faiss
A library for efficient similarity search and clustering of dense vectors.
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
flash-attention
Fast and memory-efficient exact attention
llm-export
llm-export can export llm model to onnx.
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
mmsegmentation
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
optimum-benchmark
A unified multi-backend utility for benchmarking Transformers and Diffusers with support for Optimum's arsenal of hardware optimizations/quantization schemes.
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
qwen.cpp
C++ implementation of Qwen-LM
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
safetensors
Simple, safe way to store and distribute tensors
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
triton
Development repository for the Triton language and compiler
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs