libsdl-org / SDL_mixer

An audio mixer that supports various file formats for Simple Directmedia Layer.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deadlock on Mix_PlayChannel

ebarlas opened this issue · comments

Today I encountered a strange SDL Mixer issue on my Pixel 6 Pro Android device.

If an alarm notification occurs while my SDL indie game app is running, the SDL Mixer system gets irreparably harmed.

Post-notification, calls to the SDL Mixer system stall waiting to obtain a mutex lock.

I tried rearranging interactions with SDL Mixer in a few ways, but nothing works. It seems be out of my hands as the app developer.

Screen Shot 2023-05-26 at 2 43 35 PM

What are the callstacks of the other threads in the application? In particular, it would be helpful to know what else already has the audio mutex locked.

Thanks for the reply. This is the only application interaction with the SDL API. The lock appears to be held by the internal SDL audio processing thread. I'll share that call stack later today.

@ebarlas
can you check, if it's using aaudio or opensles ?
(put some trace in src/audio/aaudio or src/audio/opensles )

If it's aaudio, maybe this is a duplicate of libsdl-org/SDL#4985 ?

@ebarlas,
Also, if you can reproduce it, try to force a back-end (aaudio or opensles) by forcing them in src/audio/SDL_audio.c
(&aaudio_bootstrap, or &openslES_bootstrap,)
to see if this is a back-end dependent issue

Here's a more complete view of the two relevant threads, (1) main application thread and (2) SDL mixer audio thread . After the alarm overlay, with sound and vibration, a deadlock ensues. I paused the debugger to freeze program execution at this point.

Also, I forgot to note that I'm using SDL_Mixer 2.6.3.

Screen Shot 2023-05-31 at 6 55 01 PM Screen Shot 2023-05-31 at 6 56 19 PM

I just confirmed that the AAudio driver is being used currently, @1bsyl. I'll try toggling.

When I switch to openslES the problem is non-existent!

All of the Android defaults have worked well across SDL libraries, so I haven't toggled them. However, do you recommend that I disable the AAudio driver?

That seems odd that the DRMP3 mixing process isn't releasing the lock. Can you verify that it actually completes in this case? Maybe the lock is taken somewhere else and then not released properly?

I had the same thought. Indeterminate audio processing would be strange. I'll try to annotate the code to gather more timing info. But from everything I've gathered so far, that application thread is completely stalled there waiting to obtain the lock from the audio processing thread.

I don't exactly remember why, because of previous issue, I use this patch for aaudio:
comment out DetectBrokenState + re-open

diff --git a/src/audio/aaudio/SDL_aaudio.c b/src/audio/aaudio/SDL_aaudio.c
index 76fe4e710..68f8c402d 100644
--- a/src/audio/aaudio/SDL_aaudio.c
+++ b/src/audio/aaudio/SDL_aaudio.c
@@ -212,6 +212,16 @@ static void aaudio_PlayDevice(SDL_AudioDevice *_this)
     aaudio_result_t res;
     int64_t timeoutNanoseconds = 1 * 1000 * 1000; /* 8 ms */
     res = ctx.AAudioStream_write(private->stream, private->mixbuf, private->mixlen / private->frame_size, timeoutNanoseconds);
+    if (res < 0) {
+        if (res == AAUDIO_ERROR_DISCONNECTED) {
+            ctx.AAudioStream_close(private->stream);
+            ctx.AAudioStreamBuilder_openStream(ctx.builder, &private->stream); // reopen stream (mem leak probably)
+            ctx.AAudioStream_requestStart(private->stream); // start new stream
+            res = ctx.AAudioStream_write(private->stream,  private->mixbuf, private->mixlen / private->frame_size, timeoutNanoseconds);
+            SDL_Log("SDL AAudio play: DISCONNECTED / RECONNECTED");
+        }
+    }
+
     if (res < 0) {
         LOGI("%s : %s", __func__, ctx.AAudio_convertResultToText(res));
     } else {
@@ -475,6 +485,7 @@ void aaudio_ResumeDevices(void)
 */
 SDL_bool aaudio_DetectBrokenPlayState(void)
 {
+#if 0
     int i;
 
     /* AAUDIO driver is not used */
@@ -515,7 +526,7 @@ SDL_bool aaudio_DetectBrokenPlayState(void)
 
         (void) captureDevice;
     }
-
+#endif
     return SDL_FALSE;
 }

but I am not sure if this is related.
maybe, here aaudio has different timings than opensles, and it spots a bug in sdl mixer. do you have up-to date sdl_mixer / sdl3?

is your main event loop blocked ? audio is blocked ? do you have only the SDL main or various threads ?

I'm using the latest SDL 2 release: 2.6.3.

Yes, main loop is blocked as shown on SDL mixer API call. I haven't instrumented the code yet, so it's hard to make specific claims about timing on the main and audio threads.

I'll add that shortly and report back.

I reproduced the issue with a bit of extra logging in mixer.c to paint a clearer picture.

I added the following snippets at the beginning and end of void mix_channels(...) called by the internal audio processing thread:

SDL_Log("mix_channels enter thread=%lu, ticks=%d", SDL_ThreadID(), SDL_GetTicks());
...
SDL_Log("mix_channels exit thread=%lu, ticks=%d", SDL_ThreadID(), SDL_GetTicks());

and the following at the beginning and end of int Mix_PlayChannel(...) called by the application:

SDL_Log("Mix_PlayChannel enter, thread=%lu, ticks=%d", SDL_ThreadID(), SDL_GetTicks());
...
SDL_Log("Mix_PlayChannel exit, thread=%lu, ticks=%d", SDL_ThreadID(), SDL_GetTicks());

The results are striking.

The Android alarm fired at about SDL-ticks 34000. Calls to mix_channels in the audio processing thread immediately increase about 100x. Prior to SDL-ticks 34000, mix_channels was called about 20x per second. Subsequently, it is called about 2000x per second.

So the audio processing workload is literally starving the application thread of access.

$ egrep "ticks=[0-9]{5}" sdl_log2.txt | grep "ticks=33" | grep "mix_channels enter" | wc -l
      22
$ egrep "ticks=[0-9]{5}" sdl_log2.txt | grep "ticks=36" | grep "mix_channels enter" | wc -l
    2022

sdl_log2.txt

mix_channels() is callback of SDL_OpenAudioDevice, set from SDL_mixer/src/mixer.c

It's called from SDL/src/audio/SDL_audio.c ( line ~592), so this seems to loop to much.

In normal case, the delay should come from end to the function, 3rd if-block. (unless you need to convert sample, that would use the 1 if-block).
is this still happening when the bug occurs ?

 642         } else { /* writing directly to the device. */
 643             /* queue this buffer and wait for it to finish playing. */
 644             current_audio.impl.PlayDevice(device);
 645             current_audio.impl.WaitDevice(device);

WaitDevice, has no implementation (but also for other backend),
so what about aaudio_PlayDevice() ?
Is there an error triggered inside ? (you need to enable the LOGI define in src/audio/aaudio/SDL_aaudio.c)

I haven't been able to reproduce the issue, by using a timer alarm. this still loops normally.
because the error case should be handle with a Delay.
there is a underflow block also, that could give informations