Syed Hasan Abbas (Hasan-Syed25)

Hasan-Syed25

Geek Repo

Location:Pakistan

Github PK Tool:Github PK Tool

Syed Hasan Abbas's starred repositories

RefAug

Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"

Language:PythonLicense:Apache-2.0Stargazers:42Issues:0Issues:0

late-chunking

Code for explaining and evaluating late chunking (chunked pooling)

Language:PythonLicense:Apache-2.0Stargazers:164Issues:0Issues:0

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonLicense:Apache-2.0Stargazers:84Issues:0Issues:0

swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Language:PythonLicense:Apache-2.0Stargazers:88Issues:0Issues:0

T-MAC

Low-bit LLM inference on CPU with lookup table

Language:C++License:MITStargazers:473Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2609Issues:0Issues:0

Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Language:PythonLicense:MITStargazers:44Issues:0Issues:0

MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:729Issues:0Issues:0

Predictive-Maintenance-using-LSTM

Example of Multiple Multivariate Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras.

Language:PythonLicense:MITStargazers:624Issues:0Issues:0

Q-GaLore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Language:PythonLicense:Apache-2.0Stargazers:163Issues:0Issues:0

buffer-of-thought-llm

[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Language:PythonLicense:MITStargazers:507Issues:0Issues:0

GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024

Language:PythonLicense:Apache-2.0Stargazers:1322Issues:0Issues:0

LLM101n

LLM101n: Let's build a Storyteller

Stargazers:29240Issues:0Issues:0

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:1492Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:81Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:64Issues:0Issues:0

lectures

Material for gpu-mode lectures

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2657Issues:0Issues:0
Language:PythonLicense:MITStargazers:3Issues:0Issues:0

real-time-data-pipelines-in-python

Real-time Feature Pipelines in Python ⚡

Language:PythonStargazers:233Issues:0Issues:0

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

Language:PythonLicense:MITStargazers:668Issues:0Issues:0

mirascope

LLM abstractions that aren't obstructions

Language:PythonLicense:MITStargazers:702Issues:0Issues:0

chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

Language:PythonLicense:Apache-2.0Stargazers:2407Issues:0Issues:0

linear_open_lm

A repository for research on medium sized language models.

Language:PythonLicense:MITStargazers:72Issues:0Issues:0

nanoXLSTM

The simplest, fastest repository for training/finetuning medium-sized xLSTMs.

Language:PythonLicense:MITStargazers:38Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:23891Issues:0Issues:0

NOLA

Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"

Language:PythonLicense:MITStargazers:47Issues:0Issues:0

PruneMe

Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models

Language:PythonStargazers:184Issues:0Issues:0

Infini-Attention

Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval

Language:PythonStargazers:64Issues:0Issues:0

SplitApp

MERN Stack Group Expense Splitting Application

Language:JavaScriptLicense:MITStargazers:29Issues:0Issues:0