IlyasMoutawwakil

followers

following

stars

@huggingface

Paris, France

ilyasmoutawwakil.github.io

Organizations

Chouafa

huggingface

Ilyas Moutawwakil's starred repositories

gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language:Jupyter NotebookGPL-3.018100

llm-perf-backend

The backend behind the LLM-Perf Leaderboard

Language:PythonApache-2.01100

py-txi

A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.

Language:PythonApache-2.02900

optimum-amd

AMD related optimizations for transformer models

Language:Jupyter NotebookMIT3900

uBlock

uBlock Origin - An efficient blocker for Chromium and Firefox. Fast and lean.

Language:JavaScriptGPL-3.04484700

insanely-fast-whisper

Language:Jupyter NotebookApache-2.0705700

optimum-nvidia

Language:PythonApache-2.084700

scrape-open-llm-leaderboard

Scrape and export data from the Open LLM Leaderboard.

Language:PythonMIT3700

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Language:PythonMIT63300

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0771400

NEFTune

Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning

Language:PythonMIT35100

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookMIT545000

text-embeddings-inference

A blazing fast inference solution for text embeddings models

Language:RustApache-2.0235200

optimum-quanto

A pytorch quantization backend for optimum

Language:PythonApache-2.068100

attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

Language:PythonApache-2.065100

llm-vscode

LLM powered development for VSCode

Language:TypeScriptApache-2.0118400

diffusion-models-class

Materials for the Hugging Face Diffusion Models Course

Language:Jupyter NotebookApache-2.0343000

cuda-python

CUDA Python Low-level Bindings

Language:PythonNOASSERTION83200

onnxscript

ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.

Language:PythonMIT25400

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Language:LLVMNOASSERTION2719600

mteb

MTEB: Massive Text Embedding Benchmark

Language:PythonApache-2.0168300

onnx

Open standard for machine learning interoperability

Language:PythonApache-2.01736900

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonMIT331000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT416900

private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks

Language:PythonApache-2.05313000

hydra-moe

Language:Python40800

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Language:PythonMIT854100

SpQR

Language:PythonApache-2.051500

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT150300

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.01525500