abhilash1910

Abhilash Majumder's repositories

llm.sycl

LLM training in SYCL

Language:C++MIT400

llama.cpp

Port of Facebook's LLaMA model in C/C++

Language:C++MIT3 10

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.01 10

alive2

Automatic verification of LLVM optimizations

MIT000

ao

PyTorch native quantization and sparsity for training and inference

Language:PythonBSD-3-Clause000

AutoAWQ_kernels

Language:CudaMIT000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT000

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language:PythonMIT000

bitsandbytes

8-bit CUDA functions for PyTorch

Language:PythonMIT000

bitsandbytes-SYCL

Hosts sycl kernels for bitsandbytes for experimental purposes.

Language:C++MIT000

cpp-dojo

Language:HTML000

cutlass-fork

CUDA Templates for Linear Algebra Subroutines

NOASSERTION000

draft

C++ standards drafts

000

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0000

FlashMLA

MIT000

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonBSD-2-Clause000

llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

NOASSERTION000

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Language:LLVMNOASSERTION000

llvm-test-suite

NOASSERTION000

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.0000

ollama

Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.

Language:GoMIT000

optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Language:PythonApache-2.0000

PaddleCustomDevice

PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)

Language:PythonApache-2.0000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION000

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

Apache-2.0000

SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language:PythonMIT000

SYCLomatic

Language:LLVMNOASSERTION000

SYCLomatic-test

Language:LLVMNOASSERTION000

tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi

Language:PythonApache-2.0000

torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

Language:C++NOASSERTION000