drahnr / cargo-spellcheck

Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

accept arbitrary urls

drahnr opened this issue · comments

Describe the bug

Alt image text is checked, but there we should be a bit more lax and allow arbitrary urls or until #44 the alt image text should be ignored. Normally this is never shown anyways.

To Reproduce

Steps to reproduce the behaviour:

  1. A file containing [![crates.io](https://img.shields.io/crates/v/cargo_spellcheck.svg)](https://crates.io/crates/cargo-spellcheck)
  2. Run cargo spellcheck README.md

Expected behavior

Please complete the following information:

  • System: Fedora
  • Obtained: git

Additional context

This is an issue with the tokenizer too, it splits on . tokens. But we could whitelist valid urls in alt image names.

Note that this can most likely not be solved by a user dictionary, since it will never see the full url, but only subtokens.

Since this has so much overlap with #44, I am assigning this to you as well - maybe an additional test case will already be enough given your work on #113

it makes sense, thanks! So do you mean that the altext should not be corrected in case is followed by a link? or that the link should not be corrected?

This is what it looks in this branch: laysa-cmark-links-handled

error: spellcheck(Hunspell)
  --> /home/tmhdev/Documents/cargo-spellcheck/README.md:2
   |
 2 | ![altext](filenamve)
   |   ^^^^^^
   | - alt ext, alt-ext, textual, textural, external, or exalt
   |
   |   Possible spelling mistake found.

error: spellcheck(Hunspell)
  --> /home/tmhdev/Documents/cargo-spellcheck/README.md:3
   |
 3 | [![crates.io](https://img.shields.io/crates/v/cargo_spellcheck.svg)](https://crates.io/crates/cargo-spellcheck)
   |           ^^
   | - oi, Io, ii, ion, bio, Rio, is, or one of 7 others
   |
   |   Possible spelling mistake found.

I think the alt text should be checked if it is an url, if so, just accept it. If not, check the text like any other. I think that makes most sense. Usually alt text contains only a few key words or an url.

For the example,
[![crates.io](https://img.shields.io/crates/v/cargo_spellcheck. svg)] (https://crates.io/crates/cargo-spellcheck)
What should be checked in this case?highlighted in negrito.

Ignore that I added some spaces just to avoid showing images or make links.

I would expect that crates.io is checked if it is a valid url, and if so it is skipped to be spell checked.

Additionally, if requested by the user, both urls should be checked for existence https://img.shields.io/crates/v/cargo_spellcheck.svg https://crates.io/crates/cargo-spellcheck ).

Does that make sense?

To speed this up, we could borrow some code from https://github.com/deadlinks/cargo-deadlinks , which works ontop of html files. We could re-use the link check code, and apply it directly on the source, without having to deal with the html parsing. Just an idea though.