[Bug] The data flushed for output delay is NOT correct.

Question

[Bug] The data flushed for output delay is NOT correct.

WeiChungChang opened this issue 2 years ago · comments

WeiChungChang commented 2 years ago

Reproduce step:

[1st round]

write a SINGLE compressed frame (any format is O.K) by aacDecoder_Fill(), in this example, it is 5th ADTS frame within an AAC file.
call aacDecoder_GetStreamInfo, get output delay. E.g. the frame size = 1024 samples, output delay = 1685 samples, channels= 2.
call aacDecoder_DecodeFrame() without AACDEC_FLUSH; the first output 1024 samples belong to output delay so we drop them here. There is still 1685 - 1024 = 661 samples still need to be compensated(dropped).
stop decode, since there is 1685 samples output delay, call aacDecoder_DecodeFrame() with AACDEC_FLUSH; in this example, since there is still 661samples to be dropped and frame size = 1024 samples, we need to call (1024 + 661 + (1024 - 1)) / 1024 = 2 times.
The 2 outputs should look like |XXX(661 samples)(363 samples)|(661 samples)---(padding samples)|. Where X represents samples to be dropped; * represents samples to be output, - represents samples of padding(don't care).

[2nd round]

As 1st round, but decode several compressed frames (e.g. as previous example, decode 5 compressed aac frame; they are 1st to 5th ADTS frame) for fully compensating output delay.
(Repeat 1 - 3 to write SEVERAL compressed frames from the same test file as 1st round)

write several compressed frame from the same test file as 1st round by aacDecoder_Fill()
call aacDecoder_GetStreamInfo, get output delay. E.g. the frame size = 1024 samples, output delay = 1685 samples, channels= 2.
call aacDecoder_DecodeFrame() without AACDEC_FLUSH; the first 1685 output samples belong to output delay so we drop them here. There is no sample still need to be compensated(dropped).
stop decode, since there is 1685 samples output delay call aacDecoder_DecodeFrame() with AACDEC_FLUSH; ** In this case, since there is still 661samples to be dropped and frame size = 1024 samples, we need to call (1024 + 661 + (1024 - 1)) / 1024 = 2 times.
The 2 outputs should look like |***(1024 samples)|(661 samples)---(padding samples)|. Where * is to be output, - is padding(don't care).

Expect result:

the output 1024 samples of 1st round should be the same as the last 1024 samples of 2nd round (since those samples both from 5th compressed frame from the same AAC file )

Result

The result from [1st round] has noise to the first half of samples(0th to 550th samples); after then, the output samples match again.

Please see the figures below; the left hand side is from [2nd round] and the right hand side is from [1st round].
Notice that they exactly match after ~550 samples but mismatch at 0th - ~550th samples.