intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Home Page:https://intel.github.io/neural-compressor/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Time based TuningCriterion to keep best performing model?

kmn1024 opened this issue · comments

Is there a way to set TuningCriterion, so that tuning runs for an allotted amount of time, and we keep the best performing model (runs fastest) that meets a certain accuracy bar (AccuracyCriterion.tolerable_loss)?

Right now, it seems like tuning exits the moment it finds a model that meets the accuracy bar, no matter what I set timeout or max_trials to. This is from my run:

2024-02-21 03:20:03 [INFO] Quantize the model with default config.
2024-02-21 03:25:33 [INFO] |*******Mixed Precision Statistics*******|
2024-02-21 03:25:33 [INFO] +------------------+-------+------+------+
2024-02-21 03:25:33 [INFO] |     Op Type      | Total | INT8 | FP32 |
2024-02-21 03:25:33 [INFO] +------------------+-------+------+------+
2024-02-21 03:25:33 [INFO] |      MatMul      |  120  | 119  |  1   |
.....
2024-02-21 03:25:33 [INFO] +------------------+-------+------+------+
2024-02-21 03:25:33 [INFO] Pass quantize model elapsed time: 330213.59 ms
2024-02-21 03:26:38 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.0036|0.0000, Duration (seconds) (int8|fp32): 64.5549|59.9128], Best tune result is: [Accuracy: 0.0036, Duration (seconds): 64.5549]
2024-02-21 03:26:38 [INFO] |**********************Tune Result Statistics**********************|
2024-02-21 03:26:38 [INFO] +--------------------+----------+---------------+------------------+
2024-02-21 03:26:38 [INFO] |     Info Type      | Baseline | Tune 1 result | Best tune result |
2024-02-21 03:26:38 [INFO] +--------------------+----------+---------------+------------------+
2024-02-21 03:26:38 [INFO] |      Accuracy      | 0.0000   |    0.0036     |     0.0036       |
2024-02-21 03:26:38 [INFO] | Duration (seconds) | 59.9128  |    64.5549    |     64.5549      |
2024-02-21 03:26:38 [INFO] +--------------------+----------+---------------+------------------+
2024-02-21 03:26:38 [INFO] Save tuning history to /home/ck/git/StyleTTS2/nc_workspace/2024-02-21_03-04-35/./history.snapshot.

MSE: 0.00363

2024-02-21 03:26:38 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-02-21 03:26:38 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-02-21 03:26:38 [INFO] Save deploy yaml to /home/ck/git/StyleTTS2/nc_workspace/2024-02-21_03-04-35/deploy.yaml

My code looks like this:

accuracy_criterion = AccuracyCriterion(
    higher_is_better=False,  # optional.
    criterion="absolute",  # optional. Available values are 'relative' and 'absolute'.
    tolerable_loss=0.005,  # optional.
)

tuning_criterion=TuningCriterion(
    timeout=36000,  # optional. tuning timeout (seconds). When set to 0, early stopping is enabled.
    max_trials=100,  # optional. max tuning times. combined with the `timeout` field to decide when to exit tuning.
    objective="performance",
    strategy="basic",
)

quant_level = "auto"
approach = "auto"

conf = PostTrainingQuantConfig(
    backend="default",
    accuracy_criterion=accuracy_criterion,
    tuning_criterion=tuning_criterion,
    quant_level=quant_level,
    approach=approach,
)

q_model = quantization.fit(
    model=onnx_model,
    conf=conf,
    calib_dataloader=dataloader,
    eval_func=eval_func,
)

Hello, @kmn1024. Thanks for raising this question. To activate what you mentioned, please set the quant_level to 1.

Here is a UT showing this usage.

Thanks! I tried it and it seems to mostly work, except when the timeout is reached, it incorrectly reports an error of not being able to find a model which meets the accuracy bar.

This is the output I now see:

...
2024-02-21 05:24:14 [INFO] Tune 8 result is: [Accuracy (int8|fp32): 0.0036|0.0000, Duration (seconds) (int8|fp32): 63.6809|59.8382], Best tune result is: [Accuracy: 0.0036, Duration (seconds): 62.5746]
2024-02-21 05:24:14 [INFO] |**********************Tune Result Statistics**********************|
2024-02-21 05:24:14 [INFO] +--------------------+----------+---------------+------------------+
2024-02-21 05:24:14 [INFO] |     Info Type      | Baseline | Tune 8 result | Best tune result |
2024-02-21 05:24:14 [INFO] +--------------------+----------+---------------+------------------+
2024-02-21 05:24:14 [INFO] |      Accuracy      | 0.0000   |    0.0036     |     0.0036       |
2024-02-21 05:24:14 [INFO] | Duration (seconds) | 59.8382  |    63.6809    |     62.5746      |
2024-02-21 05:24:14 [INFO] +--------------------+----------+---------------+------------------+
2024-02-21 05:24:14 [INFO] Save tuning history to /home/ck/git/StyleTTS2/nc_workspace/2024-02-21_03-04-35/./history.snapshot.

2024-02-21 05:27:23 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit.

Thanks for reporting this issue. I fixed it at #1620. Kindly try it at your convenience.

Yes! That works =)