[Bug]: `doesn''t` in yaml is not recognized correctly

Question

[Bug]: `doesn''t` in yaml is not recognized correctly

JounQin opened this issue 6 months ago · comments

JounQin commented 6 months ago

Info

Kind of Issue

runtime - command-line tools

Which Tool or library

cspell -- the command-line spelling tool

Which Version

Version: 7.3.2

Bug Description

Describe the bug

doesn''t means doesn't in yaml

To Reproduce

Steps to reproduce the behavior:

N/A

Expected behavior

No error

Screenshots

doesn is reported as unknown word

Additional context

N/A

cspell.json

N/A

Example Repository (Optional)

N/A

Jason Dent · Answer 1 · Mon Jan 08 2024 20:46:39 GMT+0800 (China Standard Time)

@JounQin,

Thank you. I hadn't realized that is how single quotes are embedded in a single quoted string.

At the moment, the spell checker isn't using a sophisticated parser, it basically treats the whole file as a text document. The plan is to allow custom parser per file type.

As a work around, it is possible to add doesn''t to a word list.

items:
  - name: name
    description: 'This path / doesn''t exist'

# cspell:ignore doesn''t

or

cspell.config.yaml

# yaml-language-server: $schema=https://raw.githubusercontent.com/streetsidesoftware/cspell/main/cspell.schema.json

dictionaryDefinitions:
  - name: special-yaml-words
    words:
      - doesn''t
      - hasn''t
      - isn''t
languageSettings:
  - languageId: yaml
    locale: en,en-US,en-GB
    dictionaries:
      - special-yaml-words

JounQin · Answer 2 · Mon Jan 08 2024 21:03:40 GMT+0800 (China Standard Time)

Yeah, I'm using disable comment for workaround. And for cspell core it could treat those words as known for yaml files?

Jason Dent · Answer 3 · Thu Jan 11 2024 15:10:34 GMT+0800 (China Standard Time)

@JounQin,

I have to think about how to do it. The core is language and file type agnostic. Everything is configuration driven.

There are two approaches that could work:

Parser
The spell checker supports a parser that can be used to transform text before it is spell checked. This is how the ESLint plug-in works. But, the parser is not currently a public API. The idea with the parser is to also annotate blocks of text by tagging them as string, identifier, comment, import, keyword, field name, etc. These tags could then be used to have different spell checking rules like field names should be in English, but string values should be in French for i18n files. In this case, the Yaml parser would transform 'doesn''t' into doesn't before it is checked.
Using a tryMap. A tryMap is used to replace equivalent character sequences when checking a dictionary. For example: ä with ae or æ when searching for word matches. This is currently supported at the dictionary level, meaning each dictionary, German, French, English, have their own tryMap, but it is not currently available at the file type level.

Both of these approaches have their merits. My preference is to get option 1, the parser, finished, but it is the longest away from being finished. Option 2, is much easier, since most of the support is already there.

If you have any ideas on funding, they would be most welcome.

By the way, thank you for synckit, I use it with the ESlint plugin.

JounQin · Answer 4 · Thu Jan 11 2024 15:58:22 GMT+0800 (China Standard Time)

I also prefer option 1, this issue is not in a emergency status, it could be a known issue for now, but with a correct parser, a lot of hidden issues will be fixed at once.

Challenging but worthful.

I can do some help if you want any from me.

By the way, thank you for synckit, I use it with the ESlint plugin.

That's my pleasure. 🩷

I definitely should add a section Who is using synckit in its README.