Samsung / ONE

On-device Neural Engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[circle2circle] Introduce a pass to simulate mixed-precision operators

jinevening opened this issue · comments

What

Let's introduce a new pass RemoveQDQForMixedPrecisionOp

Why

When we make a fake-quantized model, sometimes duplicate QDQ(Quantize-Dequantize) patterns appear as below.

image

In the above example, the first QDQ is for q8, and the second QDQ is for q16. For some backends, FC layer can directly generate q16 output even though its inputs are q8 (for higher accuracy). This is often called 'mixed-precision operator'.

To simulate the behavior of mixed-precision operator, we need a pass to remove the first QDQ pattern in the above pattern.

I used [circle2circle] tag, because I'm not sure it is ok to expose this option to users (one-optimize).