zhanjl / tf_quantization

Illustration of model and layer quantization in TensorFlow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorflow Quantization

Here i have illustrated and compared speed and size affeact on quantization and model reduction.

Layer Quantization

It is Post-training quantization conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy.

S No. Name Train Accuracy Test Accuracy Model Size
1 Float Model 0.9966 0.9800 84.8KB
2 Quantized Model 0.9966 0.9732 23.1KB

Model Quantization

Quantization works by reducing the precision of the numbers used to represent a model's parameters, which by default are 32-bit floating point numbers. This results in a smaller model size and faster computation.

This was test on i5 8600K and nivida gtx 1080.

S No. Name Test Accuracy Speed Model Size
1 Normal Model 0.9823 0.3968 secs 272.8KB
2 Quantized Model 0.9816 1.366 secs 23.4KB
2 TFLite Model 0.9823 0.8510 secs 83.4KB

Speed depends on architecture and devices.

About

Illustration of model and layer quantization in TensorFlow.

License:MIT License


Languages

Language:Python 100.0%