Samsung / ONE

On-device Neural Engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[onert] Introduce full quantization

hseok-oh opened this issue · comments

Let's support full quantization on runtime

  • Introduce full quantization type: #11497
  • Full quantization from circle model including minmax data & weight quantization
  • Generate circle model (buffer) including layer minmax & weight quantization for full quantization from f32 circle and minmax data
    • Use minmax-embedder library
  • Remove HDF5 dependency: #12574
  • Revise observers to introduce execution config API #13039
  • Introduce API to collect minmax data

Draft: #12903

Example

$ MINMAX_DUMP=1 ./Product/out/bin/onert_run -r 100 mobilenet_v1_1.0_224.circle
$ ./Product/out/bin/onert_run -q uint8 mobilenet_v1_1.0_224.circle
$ ./Product/out/bin/onert_run mobilenet_v1_1.0_224_quantized_q8.circle