ebiggers / libdeflate

Heavily optimized library for DEFLATE/zlib/gzip compression and decompression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add option to choose which enable enhanced instruction set should be used

SunBlack opened this issue · comments

As far as I can currently see, the library uses the standard set of processor extensions, but with some limitations:

  • It is not possible to set which processor extension should be used (e.g. if you compile on a modern PC for a customer, it may not run on the customer's PC as their processor does not have the extension) => setting via CMake would be good
  • The processor extensions will be only checked for GCC compatible compilers (defined(__i386__) || defined(__x86_64__)), but not on other compilers like MSVC (see documentation) => Could be also detected by CMake
  • By default not all processor extensions seem to be used (I don't see any usage of -march=native or -mavx2) and my tests with Godbolt shows, that e.g. AVX2 seems to be not enabled by default (and they are also not set for MSVC, see documentation). Could be tested via check_cxx_source_runs as I don't see any equivalent to -march=native on MSVC.

if you compile on a modern PC for a customer, it may not run on the customer's PC as their processor does not have the extension

This is not true. You can compile with -march=native or similar if you want to, but that's not the default.

The processor extensions will be only checked for GCC compatible compilers (defined(i386) || defined(x86_64)), but not on other compilers like MSVC (see documentation) => Could be also detected by CMake

This is a duplicate of #219, so please comment on that issue if this is something you want. However, I think you have misunderstood the work involved, as using different architecture preprocessor macros would just be a very tiny part of the work needed for full MSVC support.

By default not all processor extensions seem to be used (I don't see any usage of -march=native or -mavx2) and my tests with Godbolt shows, that e.g. AVX2 seems to be not enabled by default (and they are also not set for MSVC, see documentation). Could be tested via check_cxx_source_runs as I don't see any equivalent to -march=native on MSVC.

libdeflate already uses SSE, AVX, NEON, etc. in various places where they help a lot, and usually runtime CPU detection is used so an option like -mavx2 is unnecessary. For maintainability reasons, I don't get too carried away for optimizations that make make little difference though. If you have a specific suggestion for a place that should be taking advantage of one of these instruction sets, please file an issue describing that specific suggestion. Catch-all "optimize everything" issues are not helpful.

if you compile on a modern PC for a customer, it may not run on the customer's PC as their processor does not have the extension

This is not true. You can compile with -march=native or similar if you want to, but that's not the default.

Tested it and it seems not to work. When I set CMAKE_C_FLAGS to -mno-avx elfx86exts libdeflate.so still shows AVX (vmovdqa)

Since you have the runtime detection here:

libdeflate already uses SSE, AVX, NEON, etc. in various places where they help a lot, and usually runtime CPU detection is used so an option like -mavx2 is unnecessary.

Do you mean the detection during the compile process, or during the execution of the program? So if I compile the lib on a PC that can do AVX2 and then run it on a PC that can't do AVX2, does the lib detect that at runtime by reading the CPU flags at runtime? If so that would explain why AVX2 continues to show up as it is compiled into the lib but just not executed. Otherwise it would lead to a crash