IlyasMoutawwakil

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Language:PythonMIT47100

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION483600

ruff

An extremely fast Python linter and code formatter, written in Rust.

Language:RustMIT2876400

llama-cpp-python

Python bindings for llama.cpp

Language:PythonMIT712400

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:Python23200

float8_experimental

This repository contains the experimental PyTorch native float8 training UX

Language:PythonBSD-3-Clause18600

alphageometry

Language:PythonApache-2.0380900

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02228100

amdsmi

AMD SMI

Language:C++MIT2900

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language:PythonApache-2.056200

exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

Language:PythonMIT266400

AutoAWQ_kernels

Language:CudaMIT3200

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.044300

codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.

Language:PythonMIT100800

optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Language:PythonApache-2.012300

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause535200

gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language:Jupyter NotebookGPL-3.017100