MartinezTorres / marlin

Marlin: high throughput entropy compressor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some duplication in the pre-generated dicts

clbr opened this issue · comments

commented

At least the first two pre-generated dictionaries seem identical. Not sure if that's a generation mistake or intended.

Thats a very nice observation, sometimes, if the dictionary size is small enough, two sources with similar but not identical entropy level may result on the same dictionary. It is possible to optimize the dictionary generation process (so that dictionaries are not equidistant), and this would get rid of duplicated dictionaries. I've experimented with this in the past, but the resulting code ended up being very ugly, and the benefit very limited, so I dropped out.

So now, yes, some pre-generated dictionaries may be identical, it's not the most efficient use of ram, but as of now, it is an intended byproduct.