Robert Shaw's repositories
deepsparse-continuous-batching
DeepSparse Continuous Batching
llm-compressor-example
Example using llm-compressor
marlin-example
Example of quantizing and saving a model with Marlin
mistral-self-rag
Training mistral on self-rag task
vllm-benchmarking
Benchmarking vLLM
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
bert-benchmarking
Repo for benchmarking bert performance under various scenarios
bert-server-example
DeepSparse Server Running BERT
zephyr-training
Recreating and playing with zephyr
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
chat-example
Example calling chat api
deepsparse-llm-server-example
example for deepsparse llm in basic server
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
gptq-benchmarking
Benchmarking gptq performance and how the kernels work
gptq-experiments
Experiments running GPTQ
gptq-serialization-example
Example of gptq serialization
lm-evaluation-harness
A framework for few-shot evaluation of language models.
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
nm-vllm-example
Example running nm-vllm
one-shot-mpt-gsm-8k
Experiments for applying one shot
sparse-finetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
tgi-benchmarking
Benchmarking LLMs on GPUs
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
viggo-finetuning
Example finetuning an LLM on viggo dataset
vllm-client
Client for benchmarking vllm
vllm-examples
Example benchmarking vLLM
vllm-qa-basic-correctness
Repo for basic correctness of vllm