IST Austria Distributed Algorithms and Systems Lab

IST Austria Distributed Algorithms and Systems Lab's repositories

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Language:PythonApache-2.01804 29 48

sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Language:PythonApache-2.0666 16 31

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.0473 14 24

qmoe

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Language:PythonApache-2.0256 6 5

PanzaMail

Language:PythonApache-2.0254 7 4

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

Language:C++Apache-2.0162 6 6

OBC

Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".

Language:Python90 6 10

SparseFinetuning

Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

Language:PythonApache-2.035 5 6

RoSA

Language:PythonApache-2.028 7 1

QIGen

Repository for CPU Kernel Generation for LLM Inference

Language:Python25 60

Sparse-Marlin

Language:CudaApache-2.017 6 1

spdy

Code for ICML 2022 paper "SPDY: Accurate Pruning with Speedup Guarantees"

Language:Python17 6 3

sparseprop

Language:C++Apache-2.013 70

peft-rosa

A fork of the PEFT library, supporting Robust Adaptation (RoSA)

Language:PythonApache-2.01100

CrAM

Code for reproducing the results from "CrAM: A Compression-Aware Minimizer" accepted at ICLR 2023

Language:PythonApache-2.08 50

MicroAdam

This repository contains code for the MicroAdam paper.

Language:PythonApache-2.0600

spops

Language:C++Apache-2.06 60

CAP

Repository for Correlation Aware Prune (NeurIPS23) source and experimental code

Language:PythonApache-2.04 50

EFCP

The repository contains code to reproduce the experiments from our paper Error Feedback Can Accurately Compress Preconditioners available below:

Language:PythonApache-2.04 50

ISTA-DASLab-Optimizers

Language:PythonApache-2.0400

pruned-vision-model-bias

Code for reproducing the paper "Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures"

Language:Jupyter Notebook4 50

TACO4NLP

Task aware compression for various NLP tasks

Language:Python2 5 1

KDVR

Code for the experiments in Knowledge Distillation Performs Partial Variance Reduction, NeurIPS 2023

Language:PythonApache-2.01 50

Mathador-LM

Code for the paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".

Language:PythonApache-2.01 40

ZipLM

Code for the NeurIPS 2023 paper: "ZipLM: Inference-Aware Structured Pruning of Language Models".

011 1

AutoGPTQRoSA

Language:PythonMIT000

FastOBQ-

GPTQ with finetuning

050

GridSearcher

GridSearcher simplifies running grid searches for machine learning projects in Python, emphasizing parallel execution and GPU scheduling without dependencies on SLURM or other workload managers.

Language:PythonApache-2.0000

llm-foundry

LLM training code for Databricks foundation models

Language:PythonApache-2.0000

SPADE

Code of SPADE: Sparsity Guided Debugging for Deep Neural Networks

Language:Jupyter Notebook000