drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Panic hunspell returned non-utf8 sequence

lopopolo opened this issue · comments

Describe the bug

Panic.

To Reproduce

Steps to reproduce the behaviour:

  1. Checkout https://github.com/artichoke/focaccia/tree/lopopolo/spellcheck.
  2. Run cargo spellcheck fix.
  3. Observe a panic: drahnr/hunspell-rs#2
$ RUST_BACKTRACE=1 cargo spellcheck fix
The application panicked (crashed).
Message:  called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 2, error_len: Some(1) }
Location: /Users/lopopolo/.cargo/registry/src/github.com-1ecc6299db9ec823/hunspell-rs-0.3.0/src/lib.rs:91

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 9 frames hidden ⋮
  10: hunspell_rs::Hunspell::suggest::he42e61b8853974bb
      at <unknown source file>:<unknown line>
  11: cargo_spellcheck::checker::hunspell::obtain_suggestions::ha13e18bbfdddf950
      at <unknown source file>:<unknown line>
  12: <cargo_spellcheck::checker::hunspell::HunspellChecker as cargo_spellcheck::checker::Checker>::check::h6bb5466a5c25716d
      at <unknown source file>:<unknown line>
  13: <cargo_spellcheck::checker::Checkers as cargo_spellcheck::checker::Checker>::check::h5e084552889146be
      at <unknown source file>:<unknown line>
  14: <futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next::hdaac1a8931af86a9
      at <unknown source file>:<unknown line>
  15: <futures_util::stream::stream::buffered::Buffered<St> as futures_core::stream::Stream>::poll_next::h4007d8b12d3afdf7
      at <unknown source file>:<unknown line>
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h186af277884a5efb
      at <unknown source file>:<unknown line>
  17: std::thread::local::LocalKey<T>::with::h01c6186788c0ace4
      at <unknown source file>:<unknown line>
  18: tokio::park::thread::CachedParkThread::block_on::h062e5aef35cfdc1d
      at <unknown source file>:<unknown line>
  19: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::h32297a17d20f2ba0
      at <unknown source file>:<unknown line>
  20: tokio::runtime::Runtime::block_on::haa3e8e0f204758fa
      at <unknown source file>:<unknown line>
  21: cargo_spellcheck::run::h1976c2df04655249
      at <unknown source file>:<unknown line>
  22: cargo_spellcheck::main::hee18b2a6ff3b422f
      at <unknown source file>:<unknown line>
  23: std::sys_common::backtrace::__rust_begin_short_backtrace::he9daf82e1259c901
      at <unknown source file>:<unknown line>
  24: std::rt::lang_start::{{closure}}::h94cd7b19a83654e0
      at <unknown source file>:<unknown line>
  25: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h7b036f15aca60adb
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:280
  26: std::panicking::try::do_call::hf6119ec0466800e8
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492
  27: std::panicking::try::hcda27a2b6f836f01
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456
  28: std::panic::catch_unwind::hde37ab35642f072b
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137
  29: std::rt::lang_start_internal::{{closure}}::h103d9f9a51ce5b21
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128
  30: std::panicking::try::do_call::h0e10440d51723322
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:492
  31: std::panicking::try::h738bcf26bd63f912
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:456
  32: std::panic::catch_unwind::hc9eba21b74d8966b
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panic.rs:137
  33: std::rt::lang_start_internal::h3fd5cff071397f19
      at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/rt.rs:128
  34: _main<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.

Running cargo-spellcheck on trunk branch works as expected.

Expected behavior

hunspell does not attempt to process files that it can't parse as valid UTF-8:

https://github.com/drahnr/hunspell-rs/blob/25ea962c9fad157165cde9fbb9c3cd322be64737/src/lib.rs#L36

Screenshots

Please complete the following information:

  • System: macOS
  • Obtained: cargo
  • Version: cargo-spellcheck 0.12.0

Additional context

Since version 0.12.0 .gitgnore files are considered, where you could put the offending file as a workaround.

The referenced code hints that hunspell (the C library itself) returns invalid results. I'd buckle up - it might be a while before that is fixed. Could you upload the offending file triggering the behavior?

The branch here reliably panics for me with 0.12.0: https://github.com/artichoke/focaccia/tree/lopopolo/spellcheck

Could you provide your native hunspell lib version?

I'm not sure how to do that. I'm on a mac and I don't think I've installed hunspell.

I did a cargo install cargo-spellcheck today. I don't think I passed --locked.

That's sufficient info, thanks!

It doesn't crash for me with 0.12.0, I guess that's because I am on Fedora/Linux.

Can reproduce on Fedora with 0.12.0.

So I released 0.3.1 of hunspell-rs, this will not panic anymore but print some context. Note that this uses eprintln! for printing, 0.4.0 will return results in a few API calls that will be released soon and should mediate the issue itself.

Another possible fix is not using the bundled hunspell lib but the host one, 1.7.1 contains a few memory fixes that might contain a fix, untested.

Note that this smells some off by one error or memory corruption inside hunspell. There have been a few instances like this where adding locks will make the issue disappear. Tempted to go down the rabbit hole of compiling it to wasm and transpiling reverse as part of build.rs to avoid any kind of memory madness in the spirit of firefox.