deviceplug / btleplug

Rust Cross-Platform Host-Side Bluetooth LE Access Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When BLE connection failed in android, application getting runtime abort due to pending ClassNotFoundException on "io.github.gedgygedgy.rust.future.FutureException"

ulankoti opened this issue · comments

logcat-filtered.txt

app-debug.apk.zip

Describe the bug
When BLE connection is failed in android, runtime abort of application is observed due to pending class not found exception.
The exception is java.lang.ClassNotFoundException and class it is unable to find is "io.github.gedgygedgy.rust.future.FutureException".
Wrote an android application which will load rust native library and print the services and characteristics of nearby BLE peripheral devices. In the Rust library, I have used btleplug/examples/discover_adapters_peripherals.rs and called the function from JNI method.

Expected behavior
When BLE connection is timedout or rejected, error should be printed and continued to prepare connection request to next peripheral in the scanned list. The android application should not be stopped.

Actual behavior
But the application is getting runtime abort due to pending ClassNotFoundException on "io.github.gedgygedgy.rust.future.FutureException". But that class is part of dependent java library to droidplug-debug.aar library and droidplug-debug.aar library is added as build time dependency to the android application.
No issue observed when BLE connection is successful.

Additional context
logcat log is attached and apk also attached for reference.
snippet of the crash signature is pasted below.

2023-06-11 10:27:46.853 21775-21798 BluetoothGatt com.example.btleplugex D onClientConnectionState() - status=133 clientIf=9 device=DA:7C:35:35:F2:DF
2023-06-11 10:27:46.857 21775-21807 btleplugex-jni com.example.btleplugex E btleplugex::discover_adapters_peripherals: Error connecting to peripheral, skipping: Other(JavaException)
2023-06-11 10:27:46.858 21775-21807 mple.btleplugex com.example.btleplugex A java_vm_ext.cc:594] JNI DETECTED ERROR IN APPLICATION: JNI GetMethodID called with pending exception java.lang.ClassNotFoundException: Didn't find class "io.github.gedgygedgy.rust.future.FutureException" on path: DexPathList[[directory "."],nativeLibraryDirectories=[/system/lib64, /system_ext/lib64, /system/lib64, /system_ext/lib64]]
java_vm_ext.cc:594] at java.lang.Class dalvik.system.BaseDexClassLoader.findClass(java.lang.String) (BaseDexClassLoader.java:259)
java_vm_ext.cc:594] at java.lang.Class java.lang.ClassLoader.loadClass(java.lang.String, boolean) (ClassLoader.java:379)
java_vm_ext.cc:594] at java.lang.Class java.lang.ClassLoader.loadClass(java.lang.String) (ClassLoader.java:312)
java_vm_ext.cc:594]
java_vm_ext.cc:594] in call to GetMethodID
2023-06-11 10:27:46.985 21775-21807 mple.btleplugex com.example.btleplugex A runtime.cc:682] Runtime aborting...
runtime.cc:682] Dumping all threads without mutator lock held
.....

Hi @qdot,
can you help check the issue reported.

@ulankoti
As far as I can see in log.... you main problem is probably in java :

java_vm_ext.cc:594] JNI DETECTED ERROR IN APPLICATION: JNI GetMethodID called with pending exception java.lang.ClassNotFoundException: Didn't find class "io.github.gedgygedgy.rust.future.FutureException"
I couldn't find that class inside APK.

Hi @blandger

classes.dex inside the apk has FutureException class. ClassNotFoundException is seen only when connection is failed.
attached parsed apk image for your reference.
image

According my big java experience, that looks like JVM issue. Yes, probably it happens on error while 'discover_adapters_peripherals'. As I can see JVM class loader issue. That is first thing comes to my mind. DexPathList[[directory "."],nativeLibraryDirectories=[/system/lib64, /system_ext/lib64, /system/lib64, /system_ext/lib64]] looks strange.

.............................
2023-06-11 10:27:46.853 21775-21798 BluetoothGatt com.example.btleplugex D onClientConnectionState() - status=133 clientIf=9 device=DA:7C:35:35:F2:DF
2023-06-11 10:27:46.857 21775-21807 btleplugex-jni com.example.btleplugex E btleplugex::discover_adapters_peripherals: Error connecting to peripheral, skipping: Other(JavaException)
2023-06-11 10:27:46.858 21775-21807 mple.btleplugex com.example.btleplugex A java_vm_ext.cc:594] JNI DETECTED ERROR IN APPLICATION: JNI GetMethodID called with pending exception java.lang.ClassNotFoundException: Didn't find class "io.github.gedgygedgy.rust.future.FutureException" on path: DexPathList[[directory "."],nativeLibraryDirectories=[/system/lib64, /system_ext/lib64, /system/lib64, /system_ext/lib64]]
java_vm_ext.cc:594] at java.lang.Class dalvik.system.BaseDexClassLoader.findClass(java.lang.String) (BaseDexClassLoader.java:259)
java_vm_ext.cc:594] at java.lang.Class java.lang.ClassLoader.loadClass(java.lang.String, boolean) (ClassLoader.java:379)
java_vm_ext.cc:594] at java.lang.Class java.lang.ClassLoader.loadClass(java.lang.String) (ClassLoader.java:312)
java_vm_ext.cc:594]
java_vm_ext.cc:594] in call to GetMethodID
2023-06-11 10:27:46.985 21775-21807 mple.btleplugex com.example.btleplugex A runtime.cc:682] Runtime aborting...
.................................

The rest of log seems just a thread dumps.

Hi @blandger

Those classes are not registered to the Java VM. Instead, jni-utils-rs has classcache which substitutes the needed calls.

Looks like commenting below portion in jni-utils-rs resolved the crash.
Could you please help check what is the need to re-throw the caught and cleared exception ?

repo: https://github.com/deviceplug/jni-utils-rs.git
branch: master

jni-utils-rs$ git diff rust/exceptions.rs 
diff --git a/rust/exceptions.rs b/rust/exceptions.rs
index 468365b..54aeae0 100644
--- a/rust/exceptions.rs
+++ b/rust/exceptions.rs
@@ -86,10 +86,10 @@ impl<'a: 'b, 'b, T> TryCatchResult<'a, 'b, T> {
                         let ex = env.exception_occurred()?;
                         let _auto_local = env.auto_local(ex.clone());
                         env.exception_clear()?;
-                        if env.is_instance_of(ex, class)? {
-                            return block(ex).map(|o| Some(o));
-                        }
-                        env.throw(ex)?;
+//                        if env.is_instance_of(ex, class)? {
+//                            return block(ex).map(|o| Some(o));
+//                        }
+//                        env.throw(ex)?;
                     }
                     Ok(None)
                 })()

Looks like commenting below portion in jni-utils-rs resolved the crash. Could you please help check what is the need to re-throw the caught and cleared exception ?

It's quite hard for me to explan that unknow rust code...sorry.

i just opened a pull request that fixes this bug, but we still have some problems when the exception is thrown, cause it's never cleared on the JNI env and we get another exception like this one https://gist.github.com/trik/1f219645e6dee2440dd2f7417b817388

as a workaround i tried to clear manually when the peripheral object is build from env
trik@732716a#diff-1689a2ca69ba3382818af564e18f1d645e271736ce3b365e8414a3d266723157R117
but it's just a dirty workaround, cause if the exception is thrown and not cleared by the with_obj method and we call another method on the peripheral (eg. stop_scan), we still get the same error

i think that for some reason this part is never executed
https://github.com/deviceplug/jni-utils-rs/blob/2a585382865e151c7aba0e0e8685fbb0d0fdff50/rust/exceptions.rs#L85
debugging rust on android it's a real pain, will try to investigate further

Hi @trik
I think your change will clear all exceptions and application will lose the capability to catch the special kind of exceptions to take respective actions.

Resolved the problem at the moment but still stability issues observed with other exceptions from Java and Android BT Framework. After thorough validation, a pull request will be created for peer review.

Looks like btleplug android changes are validated for positive cases, but negative flow causing exceptions making the application to crash at runtime.

Looks like btleplug android changes are validated for positive cases, but negative flow causing exceptions making the application to crash at runtime.

I'm quite agree with you. That were one of my first thougts looking on rust code with java exceptions handling.

Hi @ulankoti,

i totally agree, that's why I said it was a dirty workaround, I needed just to make some test on a real device for the other pull I opened for the descriptors.

The point is that the Java Future exception should be caught by the jni utils, that should clear the exception the jni env and rethrow the rust exception but for some reason this does not happen. The env clear happens on the JavaException match https://github.com/deviceplug/jni-utils-rs/blob/2a585382865e151c7aba0e0e8685fbb0d0fdff50/rust/exceptions.rs#L82
so probably we are not there

@ulankoti @blandger
I did find the culprit

env.throw(ex)?;

here the exception is rethrown on the jni environment.
The solution should be to handle all the other exception children of the generic BluetoothException and return a custom error for each of them.

Imho the exception should not be rethrown on the jni environment, also when the generic JavaError is returned, otherwise any subsequent call on the same env will fail

@trik
Original intention might be after handling all the known FutureExceptions at rust/jni layers, remaining other or unknown exceptions are re-thrown to be handled at java layer.

Also, observed NoSuchElementException while testing the BLE connections back to back.

jni_utils::task: JPollResult::get
jni::wrapper::jnienv: exception found, returning error
java_vm_ext.cc:579] JNI DETECTED ERROR IN APPLICATION: JNI GetMethodID called with pending exception java.util.NoSuchElementException:
java_vm_ext.cc:579] at java.lang.Object java.util.LinkedList.removeFirst() (LinkedList.java:270)
java_vm_ext.cc:579] at java.lang.Object java.util.LinkedList.remove() (LinkedList.java:685)
java_vm_ext.cc:579] at java.lang.Object io.github.gedgygedgy.rust.stream.QueueStream.lambda$pollNext$0$io-github-gedgygedgy-rust-stream-QueueStream() (QueueStream.java:31)
java_vm_ext.cc:579] at java.lang.Object io.github.gedgygedgy.rust.stream.QueueStream$$ExternalSyntheticLambda4.get() (D8$$SyntheticClass:-1)
java_vm_ext.cc:579]
java_vm_ext.cc:579] in call to GetMethodID

Also, observed NoSuchElementException while testing the BLE connections back to back.

Check if list is empty before remove.

This one is thrown by jni-utils, apparently the list is checked before the remove
https://github.com/deviceplug/jni-utils-rs/blob/2a585382865e151c7aba0e0e8685fbb0d0fdff50/java/src/main/java/io/github/gedgygedgy/rust/stream/QueueStream.java#L30
You got this exception during the connection or the service discovery?

while iterating on the notifications after subscribing to notify characteristic.

synchronized (this.lock) {
if (!this.result.isEmpty()) {
result = () -> () -> this.result.remove();

looks like lamda on another lambda while removing the item. how to understand this ?

in add() synchronized block is not used. Could it be the reason for such crash ?

looks like lamda on another lambda while removing the item. how to understand this ?

probably the item should be removed inside the synchronize block and returned by the lambda

commented

Just a heads up on development background for this:

jni-utils-rs and most of the android core was handed to me by an anonymous contributor who I've not been able to contact in about 2 years now. While they work, and I kind of understand what's going on in them, very little if any of it was developed by the lead devs here. So it's a bit difficult to reason on why things are the way they are in order to reply to these bugs.

That said, I'm definitely open to taking patches.

@qdot I'll make some tests and open a pr on jni-utils-rs if I can find a solution

Found another crash signature

got this when the peripheral device is powered down while subscribing to notify characteristic.

2023-06-19 23:39:02.928 25045-25706 btleplugex-jni com.example.btleplugex I btleplugex::discover_adapters_peripherals: Subscribing to characteristic Characteristic { uuid: 00002a37-0000-1000-8000-00805f9b34fb, service_uuid: 0000180d-0000-1000-8000-00805f9b34fb, properties: NOTIFY } 2023-06-19 23:39:07.930 25045-25706 BluetoothGatt com.example.btleplugex D setCharacteristicNotification() - uuid: 00002a37-0000-1000-8000-00805f9b34fb enable: true 2023-06-19 23:39:07.940 25045-25706 System.out com.example.btleplugex I Uma->setCommandCallback() callback: com.nonpolynomial.btleplug.android.impl.Peripheral$6@96b95c1 2023-06-19 23:39:09.839 25045-25062 System.out com.example.btleplugex I Uma->wakeWithThrowable(), result: java.lang.RuntimeException: Unable to write descriptor 2023-06-19 23:39:09.841 25045-25065 BluetoothGatt com.example.btleplugex D onClientConnectionState() - status=0 clientIf=15 device=FB:3C:36:23:FA:AE 2023-06-19 23:39:09.843 25045-25706 btleplugex-jni com.example.btleplugex D jni::wrapper::jnienv: exception found, returning error 2023-06-19 23:39:09.843 25045-25706 btleplugex-jni com.example.btleplugex D jni_utils::exceptions: Uma-> TryCatchResult catch 2023-06-19 23:39:09.843 25045-25706 btleplugex-jni com.example.btleplugex D jni_utils::exceptions: Uma-> TryCatchResult step 4, received Ok(Err(Error::JavaException)) type 2023-06-19 23:39:09.843 25045-25706 btleplugex-jni com.example.btleplugex D btleplug::droidplug::peripheral: Uma->unknown exception, re-throwing 2023-06-19 23:39:09.844 25045-25706 btleplugex-jni com.example.btleplugex E btleplugex: discover() returned error: Other(JavaException) --------- beginning of crash 2023-06-19 23:39:09.844 25045-25706 btleplugex-jni com.example.btleplugex D btleplugex: exiting discover thread: ThreadId(30) 2023-06-19 23:39:09.846 25045-25706 AndroidRuntime com.example.btleplugex E FATAL EXCEPTION: Thread-22 Process: com.example.btleplugex, PID: 25045 io.github.gedgygedgy.rust.future.FutureException: java.lang.**RuntimeException**: Unable to write descriptor at io.github.gedgygedgy.rust.future.SimpleFuture.lambda$wakeWithThrowable$1(SimpleFuture.java:76) at io.github.gedgygedgy.rust.future.SimpleFuture$$ExternalSyntheticLambda1.get(Unknown Source:2) Caused by: java.lang.RuntimeException: Unable to write descriptor at com.nonpolynomial.btleplug.android.impl.Peripheral$6.lambda$onDescriptorWrite$0$com-nonpolynomial-btleplug-android-impl-Peripheral$6(Peripheral.java:246) at com.nonpolynomial.btleplug.android.impl.Peripheral$6$$ExternalSyntheticLambda0.run(Unknown Source:10) at com.nonpolynomial.btleplug.android.impl.Peripheral.asyncWithFuture(Peripheral.java:324) at com.nonpolynomial.btleplug.android.impl.Peripheral.access$700(Peripheral.java:24) at com.nonpolynomial.btleplug.android.impl.Peripheral$6.onDescriptorWrite(Peripheral.java:244) at com.nonpolynomial.btleplug.android.impl.Peripheral$Callback.onDescriptorWrite(Peripheral.java:408) at android.bluetooth.BluetoothGatt$1$10.run(BluetoothGatt.java:636) at android.bluetooth.BluetoothGatt.runOrQueueCallback(BluetoothGatt.java:864) at android.bluetooth.BluetoothGatt.-$$Nest$mrunOrQueueCallback(Unknown Source:0) at android.bluetooth.BluetoothGatt$1.onDescriptorWrite(BluetoothGatt.java:631) at android.bluetooth.IBluetoothGattCallback$Stub.onTransact(IBluetoothGattCallback.java:234) at android.os.Binder.execTransactInternal(Binder.java:1285) at android.os.Binder.execTransact(Binder.java:1244)

Another crash in bluetoothgatt client, but application remained running. But bluetooth scanning and further ops aren't working after observing this crash.

2023-06-19 11:58:19.416 30688-30708 BluetoothGatt com.example.btleplugex D onClientConnectionState() - status=8 clientIf=12 device=FB:3C:36:23:FA:AE 2023-06-19 11:58:19.448 30688-30708 BluetoothGatt com.example.btleplugex W Unhandled exception in callback com.nonpolynomial.btleplug.android.impl.UnexpectedCallbackException at com.nonpolynomial.btleplug.android.impl.Peripheral$CommandCallback.onConnectionStateChange(Peripheral.java:402) at com.nonpolynomial.btleplug.android.impl.Peripheral$Callback.onConnectionStateChange(Peripheral.java:333) at android.bluetooth.BluetoothGatt$1$4.run(BluetoothGatt.java:272) at android.bluetooth.BluetoothGatt.runOrQueueCallback(BluetoothGatt.java:780) at android.bluetooth.BluetoothGatt.access$200(BluetoothGatt.java:41) at android.bluetooth.BluetoothGatt$1.onClientConnectionState(BluetoothGatt.java:267) at android.bluetooth.IBluetoothGattCallback$Stub.onTransact(IBluetoothGattCallback.java:192) at android.os.Binder.execTransactInternal(Binder.java:1021) at android.os.Binder.execTransact(Binder.java:994)

This could be due to gatt client not closed after the disconnect and possibility of hitting https://github.com/deviceplug/btleplug/blob/0.10.5/src/droidplug/java/src/main/java/com/nonpolynomial/btleplug/android/impl/Peripheral.java#L65 without setting command callback.

after back to back connect/disconnect in same thread or different thread sequentially, found the connect/disconnect sequence broken after 23 iterations on pixel 4a mobile. Resolved this after calling close() in disconnected callback.

Also, observed NoSuchElementException while testing the BLE connections back to back.

Check if list is empty before remove.
@qdot @blandger
Added synchronized block while adding item into QueueStream.
Please review PR at jni-utils-rs repo
deviceplug/jni-utils-rs#3

@qdot @blandger @trik please help review #318 to fix this issue

@ulankoti it does make sense, it's more or less the same solution i was working on. will test and get back with some comments..

in the meantime, i was also working on a refactoring of all the droidplug module. relying on an external library which is not maintained anymore could not be a good thing. i'm trying to use j4rs and the first results look promising. it's less tricky to integrate with flutter or similar (i'm using tauri for my tests), since there's no need to set the java thread context class loader when spawning tokio threads, has a simpler api, supports rust to java async calls using CompletableFutures with significant performance benefits compared to the polling strategy and last but not least has an active community

commented

in the meantime, i was also working on a refactoring of all the droidplug module

Oh that's fantastic to hear! Definitely excited to see how this turns out. :D

@qdot if you have the time time, this is the branch i'm currently working on https://github.com/trik/btleplug/tree/j4rs

src/droidplug contains the current code that i'm using just for reference,
src/droidplugnext contains the new implementation

until now, the only feature available is the scan (start/stop with basic properties discovery, no characteristics)
you can build the android support library with cargo build --features android-support-library which generates both the debug and release aar on the manifest directory

on the java side, we just need to add in the gradle script these dependencies

    implementation("io.github.astonbitecode:j4rs:0.16.1")
    implementation(files("PATH_TO_SUPPORT_AAR"))

inside the android section we need to add this

    packaging {
        resources.excludes.add("META-INF/INDEX.LIST")
    }

cause j4rs library and it's transitive depencies are badly packeged and contain multiple meta files

on the rust side, we only need to use the droidplug_init macro exported by the new module that generates the JNI_OnLoad function with all the necessary j4rs/btleplug initialization

commented

Ok, so we've got competing PR's happening here, between #315 and #318. I brought in #315 to our dev branch having not realized there was another on top of that. It looks like #315 was a small subset of the fixes in #318, so I've got two options here: have #318 rebased by @ulankoti to deal with #315 already being in dev, or back out #315 and just bring in #318. Since everyone involved is on this thread and I am mostly playing dumb repo manager right now, I'll let y'all hash this out and tell me what to do.

mine was just a partial fix, @ulankoti did go further fixing other problems. i had no time to test his fixes but i reviewed the code and it looks good

Hi @qdot

Please let me know if I need to rebase the PR #318 to dev branch. I will resolve the conflicts and rebase the PR.
Earlier I created the PR to master branch.

I am not aware the fixes have to be pushed to dev branch

Thank you @trik for your review.

Hi @qdot please help review #318

Gonna try to kick out a new version in the next few days with this in it.

btleplug v0.11.0 and jni-utils 0.1.1 are now live. I ran quick tests on windows and android (though didn't try anything involving the new descriptor stuff, this was just smoke testing what I currently use things for), everything seems happy.