JuulLabs / kable

Kotlin Asynchronous Bluetooth Low-Energy

Home Page:https://juullabs.github.io/kable

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Android ANR when collecting advertisements

miwright2 opened this issue · comments

My Android app is in production and uses kable. It's working great for us. But I'm getting ANRs from Crashlytics that backtrace to this line in kable.

One of the most interesting parts is that the issue effects only Samsung users. I have ~244 events over 94 users in the last 90 days.

My code essentially looks like this...

allMatchesScanner
    .advertisements
    .buffer(capacity = 5, onBufferOverflow = BufferOverflow.DROP_OLDEST)
    .flowOn(Dispatchers.IO)
    .catch { 
        stop()
    }
    .onCompletion {
        Log.d(LOG_TAG, "Scanning stopped ...")
    }
    .collect { advertisement ->
        // process advertising packet
    }

And this is the backtrace...

[inc.combustion.app_issue_4ef85018dda05154ffababd15f0888f2_ANR_session_6466959903A400015B7E1A496343B900_DNE_0_v2_stacktrace.txt](https://github.com/JuulLabs/kable/files/11521287/inc.combustion.app_issue_4ef85018dda05154ffababd15f0888f2_ANR_session_6466959903A400015B7E1A496343B900_DNE_0_v2_stacktrace.txt)

main (timed waiting):tid=1 systid=23422 
       at jdk.internal.misc.Unsafe.park(Native method)
       at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
       at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:88)
       at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
       at kotlinx.coroutines.BuildersKt.runBlocking(unavailable:1)
       at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
       at kotlinx.coroutines.BuildersKt.runBlocking$default(unavailable:1)
       at kotlinx.coroutines.channels.ChannelsKt__ChannelsKt.trySendBlocking(Channels.kt:38)
       at kotlinx.coroutines.channels.ChannelsKt.trySendBlocking(unavailable:1)
       at com.juul.kable.AndroidScanner$advertisements$1$callback$1.onScanResult(Scanner.kt:48)
       at android.bluetooth.le.BluetoothLeScanner$BleScanCallbackWrapper$1.run(BluetoothLeScanner.java:669)
       at android.os.Handler.handleCallback(Handler.java:942)
       at android.os.Handler.dispatchMessage(Handler.java:99)
       at android.os.Looper.loopOnce(Looper.java:226)
       at android.os.Looper.loop(Looper.java:313)
       at android.app.ActivityThread.main(ActivityThread.java:8757)
       at java.lang.reflect.Method.invoke(Native method)
       at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:571)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1067)

Do you guys have any suggestions? I don't know if I should call the API differently to avoid this issue or if this is a bug. Any input you have would be greatly appreciated.

Let me know if there is more information I can provide.

Internally Kable applies flowOn(Dispatchers.Main.immediate) to the advertisements flow:

}.flowOn(Dispatchers.Main.immediate)

In an experiment, I tried overriding the flowOn by doing two consecutive flowOns:

Source code
import kotlinx.coroutines.CoroutineStart.UNDISPATCHED
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.GlobalScope
import kotlinx.coroutines.channels.BufferOverflow.DROP_OLDEST
import kotlinx.coroutines.channels.Channel
import kotlinx.coroutines.channels.trySendBlocking
import kotlinx.coroutines.delay
import kotlinx.coroutines.flow.buffer
import kotlinx.coroutines.flow.flowOn
import kotlinx.coroutines.flow.receiveAsFlow
import kotlinx.coroutines.launch
import kotlin.time.Duration.Companion.seconds

suspend fun main() {
    val channel = Channel<Int>()

    GlobalScope.launch(start = UNDISPATCHED) {
        channel
            .receiveAsFlow()
            .flowOn(Dispatchers.Main) // Commenting out this line allows data to "flow".
            .buffer(capacity = 5, onBufferOverflow = DROP_OLDEST)
            .flowOn(Dispatchers.IO)
            .collect {
                delay(1.seconds)
                println(it)
            }
    }

    (0..1_000_000_000).forEach {
        println("--> $it")
        channel.trySendBlocking(it)
    }
}

But it didn't seem to override the Dispatchers.Main (as I had expected).

...this is all to say, I don't believe Kable should be opinionated about the dispatcher the flow is on if it can't be overridden.

Please try to following SNAPSHOT (which has the flowOn removed):

repositories {
    maven("https://oss.sonatype.org/content/repositories/snapshots")
}

dependencies {
    implementation("com.juul.kable:core:0.23.0-issue-485-1-SNAPSHOT")
}

It is based on Kable v0.23.0.

Hello @miwright2, if you still have the issue, I think it is worth to remove the buffer and check if that resolves it.

@mmaleiter -- thanks for the suggestion.

I initially did not have the buffer, but I noticed the trySendBlocking. I'll try removing that buffer with @twyatt changes if the changes alone don't work. I was also wondering if conflating the flow would help me. I'm not concerned about occasionally dropping advertising packets because they are connectionless.

Thanks for the advice and I'll give that approach a shot if needed.

@twyatt -- also, thanks for your quick input! I'm planning on trying out that snapshot and putting it out in our next release assuming it integrates fine for me. I'll keep y'all filled in after that release goes live.

I think the buffer should've been fine to use. I suspect the flowOn that Kable used internally was stalling the trySendBlocking even before your buffer had any affect.

Hopefully w/o the internal flowOn (i.e. the SNAPSHOT) then the flow will operate as expected (and work w/ or w/o the buffer). 🤞

trying out that snapshot and putting it out in our next release assuming it integrates fine for me

Assuming you were already on 0.23.0, then it should be a drop-in replacement w/o needing any code changes on your side.


Keep us posted. Thanks!

Hey @twyatt , I'm working with Matt on this and trying to get the snapshot you mentioned above going, but I'm failing to resolve that snapshot build. Is there a chance it might have been deleted? Thanks!

The snapshot index appears to still have it: https://oss.sonatype.org/content/repositories/snapshots/com/juul/kable/core/0.23.0-issue-485-1-SNAPSHOT/

Are you sure you added the snapshot repository?

repositories {
    maven("https://oss.sonatype.org/content/repositories/snapshots")
}

I can republish the snapshot if needed; let me know.

The snapshot index appears to still have it: https://oss.sonatype.org/content/repositories/snapshots/com/juul/kable/core/0.23.0-issue-485-1-SNAPSHOT/

Are you sure you added the snapshot repository?

repositories {
    maven("https://oss.sonatype.org/content/repositories/snapshots")
}

Yeah, I had--I'm not sure why, but I had to blow away a couple caches, and now it works. Thanks for the quick response!

@twyatt -- Did the changes you made for this issue make it into 0.24.0?

@twyatt -- Did the changes you made for this issue make it into 0.24.0?

@twyatt -- nevermind, I see that it isn't in 0.24.0.

@miwright2 I'm waiting for a report back that it resolved the issue before cutting a release with that change. As it does have an impact on library consumers, so I wanted to be sure that was the actual cause/fix for the problem.

@twyatt -- sure thing -- we are about to release using that snapshot. I will let you know how it goes. I just wanted to make sure that it didn't already make it into a release.

@twyatt -- this fix looks good for us as far as the ANR. With that snapshot build, I'm getting several reports of this crash:

Fatal Exception: java.lang.NullPointerException: characteristic.value must not be null
       at com.juul.kable.gatt.Callback.onCharacteristicRead(Callback.kt:144)
       at android.bluetooth.BluetoothGatt$1$6.run(BluetoothGatt.java:405)
       at android.os.Handler.handleCallback(Handler.java:938)
       at android.os.Handler.dispatchMessage(Handler.java:99)
       at android.os.Looper.loopOnce(Looper.java:201)
       at android.os.Looper.loop(Looper.java:288)
       at android.os.HandlerThread.run(HandlerThread.java:67)

Do you know when the next release might be available that will have the the fix for this merged in?

Thanks for verifying the ANR fix!

I'll cut a release shortly after #537 has the necessary approvals and is merged. UPDATE: It has been released.

As for the NullPointerException you're seeing, let's track that in #540.

@miwright2 ANR fix was included in 0.25.0 release.