Assertion failed: L_num >= (FIXP_DBL)0

Question

Assertion failed: L_num >= (FIXP_DBL)0

wader opened this issue 10 months ago · comments

Versions:
fdk-aac 3f864cc (master at report time)

Reproduction:

$ ./aac-enc -r 64000 -t 5 libfdk_aac_he_64k_assert_16bit.wav out.aac
Assertion failed: (L_num >= (FIXP_DBL)0), function fDivNorm, file fixpoint_math.cpp, line 457.

libfdk_aac_he_64k_assert_16bit.wav is a volume lowered and resampled cut out from a 24bit FLAC source file with regions or lots of clipping. Unfortunately i can't share it but hope the cut out is enough. Seems to only happen with AOT 5 HE-AAC and bitrate >= 60k.

libfdk_aac_he_64k_assert_16bit.wav.zip

Mattias Wadman · Answer 1 · Wed Sep 27 2023 22:43:11 GMT+0800 (China Standard Time)

Dugg a bit more but i have no idea what i'm doing. It looks like it could be some fixpoint number that overflows and becomes negative.

Added:

printf("L_num= float=%g int=%d hex=%x\n", FX_DBL2FL(L_num), L_num, L_num);
printf("L_denum= float=%g int=%d hex=%x\n", FX_DBL2FL(L_denum), L_denum, L_denum);

Then i see:

L_num= float=-0.0198283 int=-42580882 hex=fd76446e
L_denum= float=0.00382481 int=8213714 hex=7d54d2
Assertion failed: (L_num >= (FIXP_DBL)0), function fDivNorm, file fixpoint_math.cpp, line 462.

My layman's understanding is that the overflow happens in calculateSbrEnvelopeseems when energy sums are calculated. Would it be possible to clamp the value somehow?

Martin Storsjö · Answer 2 · Thu Oct 05 2023 20:56:41 GMT+0800 (China Standard Time)

Thanks for the report, I'm able to reproduce this.

Your clue seems quite right; if building fdk-aac with sanitizers enabled (this is a bit finicky, as we're linking as C even if the code is C++, so it fails if building with sanitizers with Clang currently); I'd recommend the options -fsanitize=address,undefined -fno-omit-frame-pointer -fno-sanitize=shift-base -DDEBUG -fno-sanitize-recover=all, we hit the following error:

../fdk-aac/libSBRenc/src/env_est.cpp:640:11: runtime error: signed integer overflow: 1356066622 + 1053884691 cannot be represented in type 'int'

That's possible to work around by using saturating adds like this:

diff --git a/libSBRenc/src/env_est.cpp b/libSBRenc/src/env_est.cpp
index cc8780a..e7eea37 100644
--- a/libSBRenc/src/env_est.cpp
+++ b/libSBRenc/src/env_est.cpp
@@ -637,8 +637,8 @@ static FIXP_DBL getEnvSfbEnergy(
     for (; l < stop_pos; l++) {
       nrg2 += YBuffer[l >> YBufferSzShift][k] >> sc1;
     }
-    accu1 += (nrg1 >> dynScale1);
-    accu2 += (nrg2 >> dynScale2);
+    accu1 = fAddSaturate(accu1, (nrg1 >> dynScale1));
+    accu2 = fAddSaturate(accu2, (nrg2 >> dynScale2));
   }
   /* This shift factor is always positive. See comment above. */
   nrgSum +=

I can try to test this more thoroughly and try to push a fix later.

Mattias Wadman · Answer 3 · Thu Oct 05 2023 21:49:43 GMT+0800 (China Standard Time)

Ok thanks! that would be great

Mattias Wadman · Answer 4 · Fri Oct 06 2023 16:22:24 GMT+0800 (China Standard Time)

Looking at the code i get a feeling there might be similar issue lurking around? but i guess it's a lot of work to go thru and test things?

Martin Storsjö · Answer 5 · Fri Oct 06 2023 16:27:43 GMT+0800 (China Standard Time)

Looking at the code i get a feeling there might be similar issue lurking around? but i guess it's a lot of work to go thru and test things?

Yes, the code is essentially littered with such potential issues. I think the main point is that in most cases where saturated adds aren't used, it should be limited by some higher level logic that the overflows shouldn't be triggerable with any possible input.

For the decoder, it has been running quite a bit with fuzzers for a few years, so most such issues should have been fixed there. For the encoder, it's usually most triggered by "unusual" input that, e.g. caused by decoding artifacts or clipping or such issues.

If we'd be able to set up fuzzing of the encoder (in a way that it actually catches interesting things in a meaningful amount of time) we could probably fix more such issues though.

Also, as you probably know - I don't really develop this myself, I just repackage what Fraunhofer releases via Android/AOSP. AFAIK Android themselves have spent some effort on fuzzing the decoder as well (there were quite a few such fixes a couple years back, but these days there doesn't seem to be much left), but I'm not sure how much effort they've spent on the encoder side.

Mattias Wadman · Answer 6 · Fri Oct 06 2023 16:43:18 GMT+0800 (China Standard Time)

Yeap understand and i'm very thankful your making the library easer to build and use. Do you know if Fraunhofer keeps an eye on fixes to this fork? I have some Fraunhofer contacts so maybe i can ping someone and see if they care about these kind of fixes.

I think this particular assert is the only one i've seen one of twice in a system that transcodes a lot of audio so these kind of bugs seems to very rare in practice. Based on your fix at least i will probably be able to help draft fixes if it would happen again.

Martin Storsjö · Answer 7 · Fri Oct 06 2023 16:45:46 GMT+0800 (China Standard Time)

Yeap understand and i'm very thankful your making the library easer to build and use. Do you know if Fraunhofer keeps an eye on fixes to this fork? I have some Fraunhofer contacts so maybe i can ping someone and see if they care about these kind of fixes.

Not sure if they check explicitly here or not; I used to have a Fraunhofer contact that I talked to occasionally as well, but I haven't heard from them in a few years now.

Martin Storsjö · Answer 8 · Fri Oct 06 2023 18:52:36 GMT+0800 (China Standard Time)

if building fdk-aac with sanitizers enabled (this is a bit finicky, as we're linking as C even if the code is C++, so it fails if building with sanitizers with Clang currently); I'd recommend the options -fsanitize=address,undefined -fno-omit-frame-pointer -fno-sanitize=shift-base -DDEBUG -fno-sanitize-recover=all

Just for the record; with Clang 17.0, this isn't an issue any longer, and with older Clang, one can add -fno-sanitize=function, to avoid the references to C++ symbols.

Martin Storsjö · Answer 9 · Fri Oct 06 2023 21:25:59 GMT+0800 (China Standard Time)

I pushed a bunch of CI improvements to git master, and updated with a few (very minor) updates from upstream. However the regression test that I added does show that the diff I made above actually does affect the output even in cases where we thought we'd weren't triggering any overflows before.

From looking at the implementation of fAddSaturate, it seems like it loses the last bit of precision. I guess that should be tolerable - but it does make a difference in the encoder output.

Mattias Wadman · Answer 10 · Fri Oct 06 2023 21:51:59 GMT+0800 (China Standard Time)

Oj lots of stuff happening on master :) That is interesting, i would have expected no difference but yeah looking at the code i'm not that surprised

Martin Storsjö · Answer 11 · Fri Nov 10 2023 18:30:36 GMT+0800 (China Standard Time)

I got word that the change in output due to saturation here, should be acceptable - so I pushed a commit to master that does this, and updates the test references.

Mattias Wadman · Answer 12 · Fri Nov 10 2023 18:32:37 GMT+0800 (China Standard Time)

@mstorsjo Thanks! sorry i could probably have tested this fix for regression in some systems where it's used but didn't think about that. But ill let you know if i run into something.

Martin Storsjö · Answer 13 · Fri Nov 10 2023 18:34:55 GMT+0800 (China Standard Time)

@mstorsjo Thanks! sorry i could probably have tested this fix for regression in some systems where it's used but didn't think about that. But ill let you know if i run into something.

No problem; given the nature of the change, I believe that the risk of regression is near zero, it's mostly a case whether the loss in accuracy matters. (Theoretically, of course, the loss of accuracy could make otherwise working inputs trigger other corner cases elsewhere...) But I think it should be fine, and my Fraunhofer contacts also said that it should be ok.

Mattias Wadman · Answer 14 · Fri Nov 10 2023 18:36:59 GMT+0800 (China Standard Time)

Planning on tagging a version soonish?

Martin Storsjö · Answer 15 · Fri Nov 10 2023 19:17:01 GMT+0800 (China Standard Time)

Planning on tagging a version soonish?

I haven't given it much thought, TBH. There hasn't been a huge amount of changes since the last version, but it's been a couple years since last time, so I guess it could be time.

I can probably try to make a new release within the next few weeks.

Mattias Wadman · Answer 16 · Fri Nov 10 2023 19:46:17 GMT+0800 (China Standard Time)

Great, thanks again!

Adrian Cable · Answer 17 · Sat Nov 18 2023 02:51:23 GMT+0800 (China Standard Time)

@mstorsjo - a question here on the fix/implementation. fAddSaturate is defined as follows:

inline FIXP_DBL fAddSaturate(const FIXP_DBL a, const FIXP_DBL b) {
  LONG sum;

  sum = (LONG)(a >> 1) + (LONG)(b >> 1);
  sum = fMax(fMin((INT)sum, (INT)(MAXVAL_DBL >> 1)), (INT)(MINVAL_DBL >> 1));
  return (FIXP_DBL)(LONG)(sum << 1);
}

... which indeed loses the LSB. I understand what the function is doing here but why do they not trivially fix that at the end e.g.

  return (FIXP_DBL)(LONG)((sum << 1) + ((a & 1) ^ (b & 1)));

This would avoid the precision loss which it seems is quite undesirable.

Martin Storsjö · Answer 18 · Sat Nov 18 2023 05:20:12 GMT+0800 (China Standard Time)

... which indeed loses the LSB. I understand what the function is doing here but why do they not trivially fix that at the end e.g.
  return (FIXP_DBL)(LONG)((sum << 1) + ((a & 1) ^ (b & 1)));
This would avoid the precision loss which it seems is quite undesirable.

I'm not sure I'm following correctly - that doesn't seem like it would give the right result? If both a & 1 == 1 and b & 1 == 1, then the extra term ((a & 1) ^ (b & 1))) won't add anything. And if both of those lowest bits are set, we'd need to carry the resulting bit through into the rest of sum << 1, while redoing the saturation.

Without manually checking, I think something like this could be accurate though:

  sum = (LONG)(a >> 1) + (LONG)(b >> 1) + ((a & 1) & (b & 1));
  sum = fMax(fMin((INT)sum, (INT)(MAXVAL_DBL >> 1)), (INT)(MINVAL_DBL >> 1));
  return (FIXP_DBL)(LONG)(sum << 1) + ((a & 1) ^ (b & 1)));

As for why that's not done; I'm not sure - the reason might be quite far buried in development history within Fraunhofer. My hunch would be that this function mainly is used for values where the loss of accuracy for the lowest bit is tolerable, and complicating the function might affect performance (although it looks like it would be quite negligible).

Adrian Cable · Answer 19 · Sat Nov 18 2023 08:08:10 GMT+0800 (China Standard Time)

@mstorsjo - yes, I believe your correction is right (although again I haven't tested).

I guess my question was that I'm not sure whether the change made by this commit (which loses the LSB in getEnvSfbEnergy whereas previously it was not lost) matters or not - which may be a different question to whether losing the LSB during Fraunhofer's prior usage of fAddSaturate matters or not.

Martin Storsjö · Answer 20 · Fri Nov 24 2023 17:53:09 GMT+0800 (China Standard Time)

I guess my question was that I'm not sure whether the change made by this commit (which loses the LSB in getEnvSfbEnergy whereas previously it was not lost) matters or not - which may be a different question to whether losing the LSB during Fraunhofer's prior usage of fAddSaturate matters or not.

That's a fair question I guess. FWIW, I did run this change through Fraunhofer and they said it should be ok, but I'm not sure if they did any extensive analysis on it.

I passed this question back to them as well, let's see if they have any good insights on the matter.

Martin Storsjö · Answer 21 · Thu Dec 21 2023 19:35:10 GMT+0800 (China Standard Time)

Planning on tagging a version soonish?

I haven't given it much thought, TBH. There hasn't been a huge amount of changes since the last version, but it's been a couple years since last time, so I guess it could be time.

I can probably try to make a new release within the next few weeks.

It took more than a few weeks, but now there's a new release tagged, and with a release tarball on sourceforge.

Mattias Wadman · Answer 22 · Thu Dec 21 2023 19:39:29 GMT+0800 (China Standard Time)

🥳 no worries, thanks and happy holidays!