Cyan4973 / FiniteStateEntropy

New generation entropy codecs : Finite State Entropy and Huff0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GCC and LLVM CLANG Stats

X4 opened this issue · comments

commented

Hi,

I ran these on a mediocre Core2Duo with 1.3GHz and 4GB DDR on Gentoo Linux with the latest kernel and thought that sharing these stats might be useful. I've not integrated Yepp! into FSE, but would be curious, if it could bring any advantage. (The -lyeppp flag was used in the hope that it would have a positive effect.) I have tried various combinations of gcc/clang flags and none have had a positive effect, except -02 in combination with -lyeppp, but most visibly the usage of -funroll-loops in combination with clang version 3.3.

FSE : Finite State Entropy, capability demo by Yann Collet (Jan 12 2014)

File already compressed

GCC
../data/win98-lz : 4671615 -> 4671758 (100.0%), 73.0 MB/s , 1420.2 MB/s
GCC -funroll-loops
../data/win98-lz : 4671615 -> 4671758 (100.0%), 76.5 MB/s , 1405.2 MB/s
GCC -funroll-loops -lyeppp
../data/win98-lz : 4671615 -> 4671758 (100.0%), 75.9 MB/s , 1420.9 MB/s

CLANG
../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.4 MB/s , 1409.0 MB/s
CLANG -funroll-loops
../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.4 MB/s , 1418.3 MB/s
CLANG -funroll-loops -lyeppp
../data/win98-lz : 4671615 -> 4671758 (100.0%), 78.3 MB/s , 1431.4 MB/s

File is uncompressed

GCC
../data/win98-lz : 12536244 -> 4671591 (37.26%), 73.0 MB/s , 96.9 MB/s
GCC -funroll-loops
../data/win98-lz : 12536244 -> 4671591 (37.26%), 76.5 MB/s , 112.9 MB/s
GCC -funroll-loops -lyeppp
`../data/win98-lz : 12536244 -> 4671591 (37.26%), 76.5 MB/s , 112.9 MB/s

CLANG
../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.2 MB/s , 107.9 MB/s
CLANG -funroll-loops
../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.2 MB/s , 108.0 MB/s
CLANG -funroll-loops -lyeppp
../data/win98-lz : 12536244 -> 4671591 (37.26%), 78.6 MB/s , 108.0 MB/s

EDIT:
I have been thinking for several months about entropy, the universe and the use and state of compression in computer science. I've also used entropy as a main theme in my thesis. What strikes me is that the similarity between this and a neural networks is diminishing, if you chained multiple FSE's into a multi-layer network. Thought leader in this region is currently the work of Prof. Dr. Jürgen Schmidthuber's work and those of his Students which you can study here: http://www.idsia.ch/~juergen/onlinepub.html
The main problem being the topology of data in the study of entropy raises the question why topological data analysis is so rarely used in the field as a method of exploiting the nature of the dataset to achieve higher compression ratios. It would be a pleasure to exchange ideas on entropy with you. Thanks for this great contribution! I've just recently enjoyed a growing awe on groundbreaking algorithms that have come up with linear, near optimal and rarely even near perfect solutions.

The results seem to indicate that the file you tested was already compressed. Hence the result (100% means : no compression was performed).

I guess you probably tested the file win98-lz-run.fse directly.
This file is already compressed, by fse (it is its extension).

It is necessary to decompress the file first, using the following commandline :
fse -d win98-lz4-run.fse

Regards

commented

Thanks for the pointer @Cyan4973
I indeed misunderstood how the benchmark works then (thought it unpacks and repacks). Will repeat the tests.

OK, updated the original post.

Thanks for the update and additional comment.

Indeed, neural network is mentioned in data compression, but more often on the modeling part.
The main idea was, up to now, that arithmetic compression basically "solved" the entropy problem, as defined by Shannon. Hence probably the reason why very little innovations have been brought into this area since.

Also : as a minor point :
latest version of the test tool has been updated
to accept FSE-compressed file as input.

It should now propose to decompress them, and if accepted, benchmark the uncompressed version directly.

commented

Nice addition! Thanks =)
I'll wrap my head around to tackle the problem from a different angle too! I think that non-euclidean maths is best suited for better compression. And I believe that I have an idea on how the eye encodes data and how memories are encoded, but I have no clue how sensory data and sound is encoded, but I believe it's using the same encoding. The interesting part is that everything is similar and "similarly simple", yet there is an original genius idea in nature and that's what makes the whole thing such a complex subject. It's re-combining data in a complex way, which leads to the messy structure that we're not able to decipher.