Advertisement strings that aren't UTF-8 compliant throw silent errors on Android
qdot opened this issue · comments
Describe the bug
- Find a device with an advertisement name that is an invalid UTF-8 string
- Using a btleplug program, scan for device on Android
Expected behavior
Valid portion of UTF-8 name should appear.
Actual behavior
Device is never found
Additional context
This happens on https://github.com/rib/bluey too, see rib/bluey#1
I also wonder if jni-rs
might not have been ensuring it decoded the Java String as 'Modified UTF-8' in btleplug as well as bluey.
jni-rs
can Deref
a JavaStr
into a JNIStr
which has this From
trait implementation:
impl<'str_ref> From<&'str_ref JNIStr> for Cow<'str_ref, str> {
fn from(other: &'str_ref JNIStr) -> Cow<'str_ref, str> {
let bytes = other.to_bytes();
match from_java_cesu8(bytes) {
Ok(s) => s,
Err(e) => {
debug!("error decoding java cesu8: {:#?}", e);
String::from_utf8_lossy(bytes)
}
}
}
}
and it looks like that is the only path that would correctly decode a JNIStr
as a Java Modified UTF8 string.
Surprisingly it looks like a JNIStr
will itself deref into a CStr
which implements to_str()
and I think we were both unwittingly calling CStr::to_str()
which would have bypassed the from_java_cesu8
conversion to handle Java's Modified UTF8 quirks.
I'm not sure why JNIStr
allows code to deref directly into a CStr
- that seems like a pretty big foot gun :/
It also seems like JNIStr
should implement a to_str()
method itself, based on from_java_cesu8
I'd be interested to know the raw bytes you saw, to see if it maybe would have been decoded properly if only it had gone via from_java_cesu8()
.
Or maybe you could also test converting the device name to a Rust str
with something that more-explicitly goes via the existing From<&'str_ref JNIStr> for Cow<'str_ref, str>
trait, like:
let device_name_str = JavaStr::from_env(result.env, device_name_obj)?;
let device_name_str : &JNIStr = &name;
let device_name_str : Cow<str> = name_jstr.into();
At the very least it seems like this has highlighted a few jni-rs
API issues that should be addressed.
For reference here; based on an example problem name provided by @qdot I experimented with the following unit test:
#[test]
fn modified_utf8_decode() {
let bytes: [u8; 10] = [76, 86, 83, 45, 72, 48, 49, 192, 128, 0];
let jnistr = unsafe { JNIStr::from_ptr(bytes.as_ptr() as _) };
let s: Cow<str> = jnistr.into(); // OK: LVS-H01
// let s = jnistr.to_str().unwrap(); // PANIC: thread 'wrapper::strings::ffi_str::modified_utf8_decode' panicked at src\wrapper\strings\ffi_str.rs:117:29:
// called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 7, error_len: Some(1) }
println!("decoded = {s}");
assert_eq!(&*s, "LVS-H01\0");
}
which demonstrates that the name would have parsed if it had been decoded via from_java_cesu8(bytes)
.
So an alternative workaround here would be to explicitly decode with an intermediate step like: let name: Cow<str> = name.into()
.
Really though; JNIStr
should now allow code to deref into CStr
and JNIStr
should implement to_str()
in terms of from_java_cesu8
.