robertgshaw2-neuralmagic

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Apache-2.0000

buildkite-ci

000

chat-example

Example calling chat api

Language:Jupyter Notebook010

deepsparse-llm-server-example

example for deepsparse llm in basic server

Language:Jupyter Notebook010

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.0000

gptq-benchmarking

Benchmarking gptq performance and how the kernels work

010

gptq-experiments

Experiments running GPTQ

010

gptq-serialization-example

Example of gptq serialization

Language:Jupyter Notebook010

lm-evaluation-harness

A framework for few-shot evaluation of language models.

MIT000

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.0000

nm-vllm-example

Example running nm-vllm

Language:Python000

one-shot-mpt-gsm-8k

Experiments for applying one shot

Language:Jupyter Notebook010

sparse-finetuning

Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

Language:PythonApache-2.0000

tgi-benchmarking

Benchmarking LLMs on GPUs

Language:Jupyter Notebook010

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0000

viggo-finetuning

Example finetuning an LLM on viggo dataset

Language:Jupyter Notebook010

vllm-client

Client for benchmarking vllm

Language:Python000

vllm-examples

Example benchmarking vLLM

Language:Python000

vllm-qa-basic-correctness

Repo for basic correctness of vllm

Language:Python000