Panic hunspell returned non-utf8 sequence
lopopolo opened this issue · comments
Describe the bug
Panic.
To Reproduce
Steps to reproduce the behaviour:
- Checkout https://github.com/artichoke/focaccia/tree/lopopolo/spellcheck.
- Run
cargo spellcheck fix
. - Observe a panic: drahnr/hunspell-rs#2
$ RUST_BACKTRACE=1 cargo spellcheck fix
The application panicked (crashed).
Message: called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 2, error_len: Some(1) }
Location: /Users/lopopolo/.cargo/registry/src/github.com-1ecc6299db9ec823/hunspell-rs-0.3.0/src/lib.rs:91
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⋮ 9 frames hidden ⋮
10: hunspell_rs::Hunspell::suggest::he42e61b8853974bb
at <unknown source file>:<unknown line>
11: cargo_spellcheck::checker::hunspell::obtain_suggestions::ha13e18bbfdddf950
at <unknown source file>:<unknown line>
12: <cargo_spellcheck::checker::hunspell::HunspellChecker as cargo_spellcheck::checker::Checker>::check::h6bb5466a5c25716d
at <unknown source file>:<unknown line>
13: <cargo_spellcheck::checker::Checkers as cargo_spellcheck::checker::Checker>::check::h5e084552889146be
at <unknown source file>:<unknown line>
14: <futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next::hdaac1a8931af86a9
at <unknown source file>:<unknown line>
15: <futures_util::stream::stream::buffered::Buffered<St> as futures_core::stream::Stream>::poll_next::h4007d8b12d3afdf7
at <unknown source file>:<unknown line>
16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h186af277884a5efb
at <unknown source file>:<unknown line>
17: std::thread::local::LocalKey<T>::with::h01c6186788c0ace4
at <unknown source file>:<unknown line>
18: tokio::park::thread::CachedParkThread::block_on::h062e5aef35cfdc1d
at <unknown source file>:<unknown line>
19: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::h32297a17d20f2ba0
at <unknown source file>:<unknown line>
20: tokio::runtime::Runtime::block_on::haa3e8e0f204758fa
at <unknown source file>:<unknown line>
21: cargo_spellcheck::run::h1976c2df04655249
at <unknown source file>:<unknown line>
22: cargo_spellcheck::main::hee18b2a6ff3b422f
at <unknown source file>:<unknown line>
23: std::sys_common::backtrace::__rust_begin_short_backtrace::he9daf82e1259c901
at <unknown source file>:<unknown line>
24: std::rt::lang_start::{{closure}}::h94cd7b19a83654e0
at <unknown source file>:<unknown line>
25: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h7b036f15aca60adb
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:280
26: std::panicking::try::do_call::hf6119ec0466800e8
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492
27: std::panicking::try::hcda27a2b6f836f01
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456
28: std::panic::catch_unwind::hde37ab35642f072b
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137
29: std::rt::lang_start_internal::{{closure}}::h103d9f9a51ce5b21
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128
30: std::panicking::try::do_call::h0e10440d51723322
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492
31: std::panicking::try::h738bcf26bd63f912
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456
32: std::panic::catch_unwind::hc9eba21b74d8966b
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137
33: std::rt::lang_start_internal::h3fd5cff071397f19
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128
34: _main<unknown>
at <unknown source file>:<unknown line>
Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.
Running cargo-spellcheck
on trunk
branch works as expected.
Expected behavior
hunspell does not attempt to process files that it can't parse as valid UTF-8:
https://github.com/drahnr/hunspell-rs/blob/25ea962c9fad157165cde9fbb9c3cd322be64737/src/lib.rs#L36
Screenshots
Please complete the following information:
- System: macOS
- Obtained: cargo
- Version: cargo-spellcheck 0.12.0
Additional context
Since version 0.12.0 .gitgnore
files are considered, where you could put the offending file as a workaround.
The referenced code hints that hunspell (the C library itself) returns invalid results. I'd buckle up - it might be a while before that is fixed. Could you upload the offending file triggering the behavior?
The branch here reliably panics for me with 0.12.0: https://github.com/artichoke/focaccia/tree/lopopolo/spellcheck
Could you provide your native hunspell lib version?
I'm not sure how to do that. I'm on a mac and I don't think I've installed hunspell.
I did a cargo install cargo-spellcheck
today. I don't think I passed --locked
.
That's sufficient info, thanks!
It doesn't crash for me with 0.12.0
, I guess that's because I am on Fedora/Linux.
Can reproduce on Fedora with 0.12.0
.
So I released 0.3.1 of hunspell-rs
, this will not panic anymore but print some context. Note that this uses eprintln!
for printing, 0.4.0 will return results in a few API calls that will be released soon and should mediate the issue itself.
Another possible fix is not using the bundled hunspell lib but the host one, 1.7.1 contains a few memory fixes that might contain a fix, untested.
Note that this smells some off by one error or memory corruption inside hunspell. There have been a few instances like this where adding locks will make the issue disappear. Tempted to go down the rabbit hole of compiling it to wasm and transpiling reverse as part of build.rs
to avoid any kind of memory madness in the spirit of firefox.