drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Word "C++" is tokenized incorrectly and can not be whitelisted

ravenexp opened this issue · comments

Describe the bug

It is not possible to whitelist the word "C++" by adding it to the local Hunspell dictionary.

Adding "^[cC][+][+]$" to the transform_regex list also does not help.

To Reproduce

Steps to reproduce the behaviour:

  1. A file containing the word "C++"
  2. Add "C++" into the local Hunspell dictionary.
  3. Run cargo spellcheck ....
  4. A spelling error message is displayed for every "+" in "C++".

Expected behavior

Hunspell finds "C++" in the local dictionary and accepts it as correct.

Screenshots

error: spellcheck(Hunspell)
    --> /home/x/y.md:252
     |
 252 | Specifically, the GNU C++ compiler version 8.2 or newer and
     |                        ^
     |   Possible spelling mistake found.
error: spellcheck(Hunspell)
    --> /home/x/y.md:252
     |
 252 | Specifically, the GNU C++ compiler version 8.2 or newer and
     |                         ^
     |   Possible spelling mistake found.

Please complete the following information:

  • System: Arch Linux
  • Obtained: pacman
  • Version: cargo-spellcheck 0.11.2

Oh, I've accidentally found a workaround while figuring out how to make cargo-spellcheck not complain about "—" (EM-DASH).

Adding

transform_regex = [..., "^[+]$"]

to the config makes cargo-spellcheck accept "C++" as a correct word.

A workaround is to .. yes, exactly this - allow + tokens. Tokenization is done by a third party lib and will never be perfect. Either use ``` or add the workaround you found.

If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars in [Hunspell].

If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars in [Hunspell].

Thanks, that's even better!

BTW, it's not mentioned in

https://github.com/drahnr/cargo-spellcheck/blob/master/docs/configuration.md

and I had to run cargo spellcheck config --stdout to find out about this parameter.