Cyan4973 / FiniteStateEntropy

New generation entropy codecs : Finite State Entropy and Huff0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

a step in normalization of count

BeeBreeze opened this issue · comments

I am new here and still reading the source code. In this line, why normalizedCounter[s] = -1? In my opinion, -1 should be 1. Could you please explain it to me? Thanks a lot.

normalizedCounter[s] = -1;

It's a special case, meaning "this symbol has a weight of 1, because it can't be lower than 1, but really, it's so small, it should be a fraction of that". This information has consequences on the way the table is built, because not all positions in the table are equivalent, therefore such symbols will be attributed the least probable positions.

This is pretty advanced stuff. It's not "necessary" to know it. You may also just as well provide "1" to these symbols, and it will work, they will just receive a "normal slot" which is going to negatively impact the global compression ratio by a very little amount, but no big deal.

This is basic tuning, a year ago I have finally written paper about tuning: https://arxiv.org/pdf/2106.06438
For 2048 states and 256 size alphabet, ~100 byte header allows to work deltaH/H ~ 0.002 from Shannon.