Hasan-Syed25

Syed Hasan Abbas's starred repositories

RefAug

Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"

Language:PythonApache-2.04200

late-chunking

Code for explaining and evaluating late chunking (chunked pooling)

Language:PythonApache-2.016400

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonApache-2.08400

swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language:PythonApache-2.08800

T-MAC

Low-bit LLM inference on CPU with lookup table

Language:C++MIT47300

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0260900

Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Language:PythonMIT4400

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonMIT72900

Predictive-Maintenance-using-LSTM

Example of Multiple Multivariate Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras.

Language:PythonMIT62400

Q-GaLore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Language:PythonApache-2.016300

buffer-of-thought-llm

[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Language:PythonMIT50700

GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024

Language:PythonApache-2.0132200

LLM101n

LLM101n: Let's build a Storyteller

2924000

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookNOASSERTION149200

spectrum

Language:PythonApache-2.08100

kraken

Language:Jupyter NotebookApache-2.06400

lectures

Material for gpu-mode lectures

Language:Jupyter NotebookApache-2.0265700

movie-recommender

Language:PythonMIT300

real-time-data-pipelines-in-python

Real-time Feature Pipelines in Python ⚡

Language:Python23300

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

Language:PythonMIT66800

mirascope

LLM abstractions that aren't obstructions

Language:PythonMIT70200

chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

Language:PythonApache-2.0240700

linear_open_lm

A repository for research on medium sized language models.

Language:PythonMIT7200

nanoXLSTM

The simplest, fastest repository for training/finetuning medium-sized xLSTMs.

Language:PythonMIT3800

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2389100

NOLA

Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"

Language:PythonMIT4700

PruneMe

Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models

Language:Python18400

Infini-Attention

Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval

Language:Python6400

SplitApp

MERN Stack Group Expense Splitting Application

Language:JavaScriptMIT2900

detect-pretrain-code-contamination

Language:Python7300