jeromeku's repositories

Language:RustStargazers:7Issues:1Issues:0

accelerated-scan

Accelerated First Order Parallel Associative Scan

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

ao

torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

License:MITStargazers:0Issues:0Issues:0

bloop

bloop is a fast code search engine written in Rust.

Language:RustLicense:NOASSERTIONStargazers:0Issues:0Issues:0

candle

Minimalist ML framework for Rust

Language:RustLicense:Apache-2.0Stargazers:0Issues:0Issues:0

colab-connect

Connect to Google Colab VM from your local VSCode

License:MITStargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

License:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

GEMM_MMA

Optimize GEMM with tensorcore step by step

Stargazers:0Issues:0Issues:0

haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLM-Training-Puzzles

What would you do with 1000 H100s...

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

neurips_llm_efficiency_challenge

NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Language:Jupyter NotebookStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:1Issues:0

punica

Serving multiple LoRA finetuned LLM as one

Stargazers:0Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0
Language:PythonLicense:MITStargazers:0Issues:1Issues:0

rust-telemetry-workshop

A workshop that introduces participants to a comprehensive toolkit to detect, troubleshoot and resolve issues with Rust applications.

Stargazers:0Issues:0Issues:0

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

License:MITStargazers:0Issues:0Issues:0

toydb

Distributed SQL database in Rust, written as a learning project

Language:RustLicense:Apache-2.0Stargazers:0Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

trident

A performance library for machine learning applications.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0
Language:C++License:MITStargazers:0Issues:1Issues:0

unsloth

5X faster 60% less memory QLoRA finetuning

License:Apache-2.0Stargazers:0Issues:0Issues:0

xtuner

XTuner is a toolkit for efficiently fine-tuning LLM

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0