Advertisement strings that aren't UTF-8 compliant throw silent errors on Android

Question

Advertisement strings that aren't UTF-8 compliant throw silent errors on Android

qdot opened this issue 6 months ago · comments

qDot commented 6 months ago

Describe the bug

Find a device with an advertisement name that is an invalid UTF-8 string
Using a btleplug program, scan for device on Android

Expected behavior
Valid portion of UTF-8 name should appear.

Actual behavior
Device is never found

Additional context
This happens on https://github.com/rib/bluey too, see rib/bluey#1

Robert Bragg · Answer 1 · Mon Nov 20 2023 05:12:06 GMT+0800 (China Standard Time)

I also wonder if jni-rs might not have been ensuring it decoded the Java String as 'Modified UTF-8' in btleplug as well as bluey.

jni-rs can Deref a JavaStr into a JNIStr which has this From trait implementation:

impl<'str_ref> From<&'str_ref JNIStr> for Cow<'str_ref, str> {
    fn from(other: &'str_ref JNIStr) -> Cow<'str_ref, str> {
        let bytes = other.to_bytes();
        match from_java_cesu8(bytes) {
            Ok(s) => s,
            Err(e) => {
                debug!("error decoding java cesu8: {:#?}", e);
                String::from_utf8_lossy(bytes)
            }
        }
    }
}

and it looks like that is the only path that would correctly decode a JNIStr as a Java Modified UTF8 string.

Surprisingly it looks like a JNIStr will itself deref into a CStr which implements to_str() and I think we were both unwittingly calling CStr::to_str() which would have bypassed the from_java_cesu8 conversion to handle Java's Modified UTF8 quirks.

I'm not sure why JNIStr allows code to deref directly into a CStr - that seems like a pretty big foot gun :/

It also seems like JNIStr should implement a to_str() method itself, based on from_java_cesu8

I'd be interested to know the raw bytes you saw, to see if it maybe would have been decoded properly if only it had gone via from_java_cesu8().

Or maybe you could also test converting the device name to a Rust str with something that more-explicitly goes via the existing From<&'str_ref JNIStr> for Cow<'str_ref, str> trait, like:

let device_name_str = JavaStr::from_env(result.env, device_name_obj)?;
let device_name_str : &JNIStr = &name;
let device_name_str : Cow<str> = name_jstr.into();

At the very least it seems like this has highlighted a few jni-rs API issues that should be addressed.

Robert Bragg · Answer 2 · Mon Nov 20 2023 06:30:12 GMT+0800 (China Standard Time)

For reference here; based on an example problem name provided by @qdot I experimented with the following unit test:

#[test]
fn modified_utf8_decode() {
    let bytes: [u8; 10] = [76, 86, 83, 45, 72, 48, 49, 192, 128, 0];
    let jnistr = unsafe { JNIStr::from_ptr(bytes.as_ptr() as _) };
    let s: Cow<str> = jnistr.into(); // OK: LVS-H01
//  let s = jnistr.to_str().unwrap(); // PANIC: thread 'wrapper::strings::ffi_str::modified_utf8_decode' panicked at src\wrapper\strings\ffi_str.rs:117:29:
                                      // called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 7, error_len: Some(1) }
    println!("decoded = {s}");
    assert_eq!(&*s, "LVS-H01\0");
}

which demonstrates that the name would have parsed if it had been decoded via from_java_cesu8(bytes).

So an alternative workaround here would be to explicitly decode with an intermediate step like: let name: Cow<str> = name.into().

Really though; JNIStr should now allow code to deref into CStr and JNIStr should implement to_str() in terms of from_java_cesu8.