Abhilash Majumder's repositories
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
alive2
Automatic verification of LLVM optimizations
ao
PyTorch native quantization and sparsity for training and inference
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
bitsandbytes
8-bit CUDA functions for PyTorch
bitsandbytes-SYCL
Hosts sycl kernels for bitsandbytes for experimental purposes.
cutlass-fork
CUDA Templates for Linear Algebra Subroutines
draft
C++ standards drafts
flashinfer
FlashInfer: Kernel Library for LLM Serving
Liger-Kernel
Efficient Triton Kernels for LLM Training
llvm
Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
ollama
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.