ModelTC / llmc

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Home Page:https://arxiv.org/abs/2405.06001

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use Tensorrt-LLM as backend

Worromots opened this issue · comments

commented

as describe in title

You can set save_fp in llmc to True. Then you can use trt-llm ammo to convert a naive quant engine.

commented

THX for your reply. I have set save_fp in llmc to True, and these are files saved by llmc, how can I use trt-llm ammo to convert a naive quant engine.
image

commented

remark,I need your help

The following process needs to modify some codes to change the default settings in TensorRT-LLM. To help users use our tool more conveniently, we are rushing an official doc page about the tool. Please wait for our news patient.