Support for smaller quantization, 8 or 4 at least

Question

Support for smaller quantization, 8 or 4 at least

Proryanator opened this issue 3 months ago · comments

This tool is amazing, having tried scripting using the coreml library by hand, running into all kinds of fun issues, then trying this and it all being orchestrated/abstracted for you, this is excellent 👏

I noticed that there's only quantization support for down to 16 bits however, and would love to have smaller options. I do believe CoreML is capable of these so it may just be adding that call to this wrapper.

I did look in convert.py and I do see a flag use_legacy_format being checked before performing quantize 16, is there something different with how the ML Program handles or does lower bit quantization?

Ryan Laseter · Answer 1 · Sun Mar 31 2024 08:30:48 GMT+0800 (China Standard Time)

I realized that you can still quantize a coreml model after it's been made, can probably disregard this issue. Will try quantizing some existing coreml models I found.

So having this tool convert it, then doing further quantizing after should work!