usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

usyd-fsalab/fp6_llm Issues

Performance regression for large batch sizes
Closed 2 months ago5
Does fp6 need a calibration dataset to tune with?
Updated 3 months ago2
How to create FP16 quantization scales?
Updated 3 months ago2
Can we get FP4?
Updated 3 months ago1
Has the accuracy and performance been compared with awq？
Updated 3 months ago2
How to support V100(sm70)
Updated 3 months ago1
build Qwen/Qwen1.5-14B-Chat error
Updated 3 months ago1