IST Austria Distributed Algorithms and Systems Lab (IST-DASLab)

IST-DASLab

Geek Repo

0

followers

0

following

0

stars

Home Page:https://ista.ac.at/en/research/alistarh-group/

Github PK Tool:Github PK Tool

IST Austria Distributed Algorithms and Systems Lab's repositories

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Language:PythonLicense:Apache-2.0Stargazers:1758Issues:29Issues:48

sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Language:PythonLicense:Apache-2.0Stargazers:651Issues:16Issues:31

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:397Issues:13Issues:21

qmoe

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Language:PythonLicense:Apache-2.0Stargazers:253Issues:6Issues:4
Language:PythonLicense:Apache-2.0Stargazers:244Issues:6Issues:2

QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference

Language:C++License:Apache-2.0Stargazers:159Issues:6Issues:5

OBC

Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".

SparseFinetuning

Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

Language:PythonLicense:Apache-2.0Stargazers:35Issues:5Issues:0

QIGen

Repository for CPU Kernel Generation for LLM Inference

Language:PythonStargazers:25Issues:6Issues:0
Language:PythonLicense:Apache-2.0Stargazers:25Issues:0Issues:0

spdy

Code for ICML 2022 paper "SPDY: Accurate Pruning with Speedup Guarantees"

Language:C++License:Apache-2.0Stargazers:13Issues:7Issues:0

torch_cgx

Pytorch distributed backend extension with compression support

Language:C++License:AGPL-3.0Stargazers:13Issues:4Issues:5

peft-rosa

A fork of the PEFT library, supporting Robust Adaptation (RoSA)

Language:PythonLicense:Apache-2.0Stargazers:10Issues:0Issues:0
Language:CudaLicense:Apache-2.0Stargazers:10Issues:0Issues:0

sparse-imagenet-transfer

Code for reproducing the results in "How Well do Sparse Imagenet Models Transfer?", presented at CVPR 2022

Language:PythonLicense:Apache-2.0Stargazers:8Issues:6Issues:0

CrAM

Code for reproducing the results from "CrAM: A Compression-Aware Minimizer" accepted at ICLR 2023

Language:PythonLicense:Apache-2.0Stargazers:7Issues:5Issues:0

pruned-vision-model-bias

Code for reproducing the paper "Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures"

Language:Jupyter NotebookStargazers:4Issues:5Issues:0
Language:C++License:Apache-2.0Stargazers:4Issues:0Issues:0

EFCP

The repository contains code to reproduce the experiments from our paper Error Feedback Can Accurately Compress Preconditioners available below:

Language:PythonLicense:Apache-2.0Stargazers:3Issues:5Issues:0

CAP

Repository for Correlation Aware Prune (NeurIPS23) source and experimental code

Language:PythonLicense:Apache-2.0Stargazers:2Issues:4Issues:0

MicroAdam

This repository contains code for the MicroAdam paper.

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

TACO4NLP

Task aware compression for various NLP tasks

DeepLearningExamples

Deep Learning Examples

Language:PythonStargazers:1Issues:2Issues:0

KDVR

Code for the experiments in Knowledge Distillation Performs Partial Variance Reduction, NeurIPS 2023

Language:PythonLicense:Apache-2.0Stargazers:1Issues:5Issues:0

ZipLM

Code for the NeurIPS 2023 paper: "ZipLM: Inference-Aware Structured Pruning of Language Models".

Stargazers:0Issues:11Issues:1

FastOBQ-

GPTQ with finetuning

Stargazers:0Issues:5Issues:0

gcomp_sim_strip

Stripped version of gcomp_sim for ML course

Language:PythonStargazers:0Issues:3Issues:0
Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llm-foundry

LLM training code for Databricks foundation models

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0