Rework packing to improve average/worst error

Question

Rework packing to improve average/worst error

nfrechette opened this issue 3 years ago · comments

Nicholas Frechette commented 3 years ago

See this PR for original inspiration: #348 by @ddeadguyy.

It's worth noting that ACL_PACKING is only a half-step towards minimizing quantization error. If ACL discretized a range between 0.0f and 1.0f with num_bits == 1, it would do this: 0 -> 0.0f, 1 -> 1.0f. This results in an average error of 0.25f, and worst-case error of 0.5f, for all floats between 0.0f and 1.0f. Consider this instead: 0 -> 0.25f, 1 -> 0.75f. Error exists at the endpoints, but average error is only 0.125f, and worst-case error is only 0.25f.

It has been some time since I played with that code. I know some code expects to be able to reconstruct the original min/max boundaries but I am not sure if that is really required anymore. We'll have to try and see what breaks.

That being said, there is already some 'fudge' present and probably some loss as a result. We normalize our values twice: first over the whole clip, then per segment. The clip range information is stored in full float32 precision but the segment range information is quantized to 8 bits per component. As a result of this loss, we inflate the segment range min/max to account for the quantization error. This means that a true 0.0 or 1.0 within a normalized sample would potentially bring us outside the original clip range. As a result, I don't know if we ever have true 0.0 and 1.0 as quantized sample values. This might be happening while we remain within our error threshold (or not).

I last played with this when I tried a 'dirty' int->float conversion. The expensive instruction to convert an int into a float32 can be avoided and emulated by ORing the exponent bits and subtracting 1.0. We can basically treat the quantized value directly as the 23bit mantissa as-is. We can use every bit of the mantissa to represent numbers between [1.0, 2.0) and subtract 1.0 to remap. This works if our quantized value is between [0.0, 1.0) but it breaks with [0.0, 1.0]. As such, a true 1.0 cannot be reconstructed with that approach (it would end up being 2.0 which has a different exponent value).

I think your idea makes a lot of sense and we should give it a real shot at seeing if we can make it work.

This branch will be used for the development work: fix/improve-quantization-error

ddeadguyy · Answer 1 · Wed May 19 2021 08:05:13 GMT+0800 (China Standard Time)

We can basically treat the quantized value directly as the 23bit mantissa as-is...This works if our quantized value is between [0.0, 1.0) but it breaks with [0.0, 1.0].

This is where we compromise at the endpoints. Instead of [0.0, 1.0], consider [error_term, 1.0f - error_term]. For even better precision, and to function correctly at bitrate 24, consider [-0.5 + error_term, 0.5 - error_term]. I'll poke at this in my fork of fix/improve-quantization-error. The other packing and bitrate issues are related, so expect them to bleed into this.

ddeadguyy · Answer 2 · Fri Jun 04 2021 04:10:59 GMT+0800 (China Standard Time)

First, the good news. My precision improvement works, and 5 segments in my largest edge case animation get smaller.
Next, the bad news. 17 segments in my largest edge case animation get bigger, and total animation size in my test case increases by 4.8%.

I'll make a pull request for it anyways, so you can try it on your end, but I've abandoned this. I'm going back to the smaller-scale packing-only precision improvement that I used in 1.3.5.

Oh well, it was worth a try.

Nicholas Frechette · Answer 3 · Fri Jun 04 2021 10:42:33 GMT+0800 (China Standard Time)

Interesting! I'll take a look when I get the chance and keep you posted.