huismiling's repositories

accelerate

🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

bitsandbytes

8-bit CUDA functions for PyTorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

License:MITStargazers:0Issues:0Issues:0

autogen

Enable Next-Gen Large Language Model Applications. Join our Discord: https://discord.gg/pAbnFJrkgZ

License:CC-BY-4.0Stargazers:0Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

License:MITStargazers:0Issues:0Issues:0

BitDistiller

A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

License:MITStargazers:0Issues:0Issues:0

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

License:MITStargazers:0Issues:0Issues:0

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

License:MITStargazers:0Issues:0Issues:0

deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics

Language:C++License:LGPL-3.0Stargazers:0Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

License:MITStargazers:0Issues:0Issues:0

faiss

A library for efficient similarity search and clustering of dense vectors.

Language:C++License:MITStargazers:0Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

License:Apache-2.0Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

llm-export

llm-export can export llm model to onnx.

Stargazers:0Issues:0Issues:0

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

License:Apache-2.0Stargazers:0Issues:0Issues:0

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Stargazers:0Issues:0Issues:0

optimum-benchmark

A unified multi-backend utility for benchmarking Transformers and Diffusers with support for Optimum's arsenal of hardware optimizations/quantization schemes.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

License:Apache-2.0Stargazers:0Issues:0Issues:0

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

qwen.cpp

C++ implementation of Qwen-LM

License:NOASSERTIONStargazers:0Issues:0Issues:0

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

License:Apache-2.0Stargazers:0Issues:0Issues:0

safetensors

Simple, safe way to store and distribute tensors

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

License:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

License:MITStargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0