Shreyansh Singh (shreyansh26)

shreyansh26

Geek Repo

Company:Level AI

Location:New Delhi

Home Page:https://shreyansh26.github.io

Twitter:@shreyansh_26

Github PK Tool:Github PK Tool


Organizations
COPS-IITBHU

Shreyansh Singh's starred repositories

neovim

Vim-fork focused on extensibility and usability

Language:Vim ScriptLicense:NOASSERTIONStargazers:82150Issues:974Issues:11724

llama-stack-apps

Agentic components of the Llama Stack APIs

Language:PythonLicense:MITStargazers:3623Issues:40Issues:47

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:3110Issues:35Issues:71

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonLicense:MITStargazers:1238Issues:24Issues:44

nano-llama31

nanoGPT style version of Llama 3.1

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Language:C++License:NOASSERTIONStargazers:1176Issues:66Issues:165

Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models

Language:PythonLicense:Apache-2.0Stargazers:961Issues:17Issues:39

rerankers

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Language:PythonLicense:Apache-2.0Stargazers:949Issues:9Issues:16

prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

Language:PythonLicense:MITStargazers:861Issues:6Issues:6

bm25s

Fast lexical search library implementing BM25 in Python using Numpy and Scipy

Language:PythonLicense:MITStargazers:792Issues:4Issues:24

cookbook

Deep learning for dummies. All the practical details and useful utilities that go into working with real models.

Language:PythonLicense:Apache-2.0Stargazers:679Issues:13Issues:14

Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language:CudaLicense:Apache-2.0Stargazers:545Issues:5Issues:14

llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language:PythonLicense:Apache-2.0Stargazers:514Issues:13Issues:70

Sophia

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

Language:PythonLicense:Apache-2.0Stargazers:376Issues:8Issues:25

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:270Issues:4Issues:12

hash-hop

Long context evaluation for large language models

Language:PythonLicense:MITStargazers:172Issues:6Issues:3

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

Language:PythonLicense:Apache-2.0Stargazers:152Issues:13Issues:26

awesome-llm-planning-reasoning

A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.

applied-ai

Applied AI experiments and examples for PyTorch

Language:PythonLicense:BSD-3-ClauseStargazers:136Issues:12Issues:9

flashattention2-custom-mask

Triton implementation of FlashAttention2 that adds Custom Masks.

Language:PythonLicense:Apache-2.0Stargazers:63Issues:4Issues:4

Guide-NVIDIA-Tools

NVIDIA tools guide

Language:CudaStargazers:61Issues:0Issues:0

benchmark

Benchmark suite for LLMs from Fireworks.ai

Language:PythonLicense:Apache-2.0Stargazers:52Issues:4Issues:1

cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Language:CudaLicense:MITStargazers:45Issues:3Issues:0

Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Language:PythonLicense:MITStargazers:42Issues:2Issues:3

lovely-llama

An implementation of the Llama architecture, to instruct and delight

Language:PythonLicense:MITStargazers:21Issues:0Issues:0

hydragen

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Language:PythonLicense:Apache-2.0Stargazers:18Issues:1Issues:2

hip-attention

Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.

Language:PythonStargazers:14Issues:5Issues:0