Abhilash Majumder's repositories
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
ao
PyTorch native quantization and sparsity for training and inference
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
bitsandbytes
8-bit CUDA functions for PyTorch
bitsandbytes-SYCL
Hosts sycl kernels for bitsandbytes for experimental purposes.
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cutlass-fork
CUDA Templates for Linear Algebra Subroutines
flash-attention
Fast and memory-efficient exact attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
Liger-Kernel
Efficient Triton Kernels for LLM Training
llvm
Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
ollama
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
sycl-for-cuda
Codeplay project for contributions to the LLVM SYCL implementation
tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.