zeroine's repositories
aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Language:PythonNOASSERTION000
flash-attention
Fast and memory-efficient exact attention
Language:PythonBSD-3-Clause000
MQBench
Model Quantization Benchmark
Language:ShellApache-2.0000
ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Language:PythonApache-2.0000
000
TensorRT
NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
Language:C++Apache-2.0000