Q7bao's starred repositories

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Language:PythonLicense:Apache-2.0Stargazers:1845Issues:0Issues:0

AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf

Language:PythonLicense:Apache-2.0Stargazers:1091Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19397Issues:0Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:254Issues:0Issues:0

GPTQ-triton

GPTQ inference Triton kernel

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:272Issues:0Issues:0

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Language:PythonLicense:MITStargazers:4512Issues:0Issues:0

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonLicense:MITStargazers:35811Issues:0Issues:0

WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Language:PythonStargazers:9165Issues:0Issues:0

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonLicense:Apache-2.0Stargazers:9108Issues:0Issues:0

LLM-QAT

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Language:PythonLicense:NOASSERTIONStargazers:225Issues:0Issues:0

examples

Fast and flexible reference benchmarks

Language:ShellLicense:Apache-2.0Stargazers:432Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:7976Issues:0Issues:0

abnormal-floats

Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)

Language:PythonStargazers:15Issues:0Issues:0

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

License:NOASSERTIONStargazers:26176Issues:0Issues:0

fucking-algorithm

刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.

Language:MarkdownStargazers:124721Issues:0Issues:0

ScreenshotScraper

ScreenshotScraper is used to take screenshots of web pages and to create PDFs from these screenshots. There are two versions of ScreenshotScraper, the OS version, OS_Screenshot, and the web version, Web_Screenshot.

Language:PythonStargazers:8Issues:0Issues:0