Beast code in Giters

alphaRGB's starred repositories

activitywatch

The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused.

Language:PythonMPL-2.01141200

FP8-Emulation-Toolkit

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.

Language:PythonBSD-3-Clause8900

ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language:PythonApache-2.0146200

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonMIT111000

llmtools

Finetuning Large Language Models on One Consumer GPU in Under 4 Bits

Language:Python68700

Quantformer

This is the official pytorch implementation for the paper: *Quantformer: Learning Extremely Low-precision Vision Transformers*.

Language:PythonApache-2.01800

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Language:PythonMIT220300

Paper-Writing-Tips

Paper Writing Tips

200

Neural-Networks-on-Silicon

This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.

181100

GEMM_WMMA

GEMM by WMMA (tensor core)

Language:CudaApache-2.0500

ConvNN

A simple CNN training framework support on CPU and GPU(CUDNN)

Language:C++300

Deep-Learning-Accelerator-SW

NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

Language:PythonNOASSERTION16000

cuda-tensorcore-hgemm

Language:Cuda9100

NiuDianNao

A simple cycle-accurate DaDianNao simulator

Language:C++MIT1000

HolisticTraceAnalysis

A library to analyze PyTorch traces.

Language:PythonMIT25400

nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Language:PythonApache-2.0431800

TPU-Tensor-Processing-Unit

IC implementation of TPU

Language:Verilog8400

gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.

Language:C++NOASSERTION103600

alphaRGB

alphaRGB's starred repositories

activitywatch

FP8-Emulation-Toolkit

ppq

smoothquant

llmtools

Quantformer

micronet

Paper-Writing-Tips

Neural-Networks-on-Silicon

GEMM_WMMA

ConvNN

Deep-Learning-Accelerator-SW

cuda-tensorcore-hgemm

NiuDianNao

HolisticTraceAnalysis

nvitop

TPU-Tensor-Processing-Unit

gpgpu-sim_distribution

Computer-Science-Textbooks

Integrated-Circuit-Textbooks

viztracer

cudnnConvolutionTest

cuda-sgemm

goldeneye

PWLQ

awesome-model-quantization

AdaptivFloat

Deep-Compression-AlexNet

RobustViT

DynamicViT