Move to better test infrastructure

Question

Move to better test infrastructure

jdemel opened this issue 7 months ago · comments

In a lot of issues and PRs we discuss problems with our current tests.

We need to discuss a way forward to improve this situation.

One option would be to introduce gtest. We would write specific tests for some kernels first and adopt an approach where we slowly move to the new system.

Johannes Demel · Answer 1 · Sun Nov 05 2023 19:04:14 GMT+0800 (China Standard Time)

I did some tests with gtest:
https://github.com/jdemel/volk/tree/newtest

At the moment, there are quite a few areas where this can be improved.

Integration into ctest
Output prints should go into the log instead of the default output
Possible copypasta code should be reduced.
Thus, this implementation is a proof of concept and open for discussion.

Clayton Smith · Answer 2 · Sun Nov 05 2023 22:55:55 GMT+0800 (China Standard Time)

I don't have an opinion on which test framework to use, but I'll list out some things that could be improved by moving away from one-size-fits-all testing:

No more puppets!
Many kernels have fixed-length inputs (e.g. sum_of_poly) or outputs (e.g. dot_prod, index_max, stddev) but the current system always supplies variable-length buffers. This makes it difficult to catch buffer overruns on the fixed-length buffers.
All buffers are padded by 5 (vlen_twiddle), apparently to help catch out-of-bounds writes and prevent fixed-length buffers from becoming too short (see above). But this prevents tools like ASAN and valgrind from catching buffer overruns, including out-of-bounds reads.
Some kernels (e.g. index_min, index_max) only make sense for vector lengths >= 1, so length 0 should be disallowed for them.
The current tolerance options are "relative" (to the output magnitude) and "absolute". Neither of these makes much sense for kernels like dot_prod, where the error magnitude is proportional to the vector length, and is independent of the output magnitude. (If the dot product happens to be close to zero, the relative error becomes large.)
Kernels with rounded integer output are forced to use tolerance 1, even though very few of the floating point values have a fractional part near 0.5 (e.g. #647).
All floating-point kernels are tested with uniformly distributed inputs in the range -1 .. +1. For some kernels (e.g. pow, sqrt) such inputs are inappropriate, resulting in bugs like #649.
Almost all kernels are tested with the fixed scalar value 327.0, which may not be appropriate (e.g. #381).
Special cases (e.g. 0.0) are untested, allowing bugs like #622, #701, and #730 to slip through.
The 32fc_index_* kernels can have multiple possible correct answers, so the test framework should allow that. See #700 for more details.

Clayton Smith · Answer 3 · Sat Dec 09 2023 06:38:17 GMT+0800 (China Standard Time)

Another problematic case:

volk_32f_s32f_32f_fm_detect_32f and volk_32f_s32f_s32f_mod_range_32f both involve phase angle calculations, and tolerance checks fail if the angles being compared are -pi+epsilon and +pi-epsilon. (The difference in angle is very small, but the difference in absolute value is large.)