accept arbitrary urls
drahnr opened this issue · comments
Describe the bug
Alt image text is checked, but there we should be a bit more lax and allow arbitrary urls or until #44 the alt image text should be ignored. Normally this is never shown anyways.
To Reproduce
Steps to reproduce the behaviour:
- A file containing
[![crates.io](https://img.shields.io/crates/v/cargo_spellcheck.svg)](https://crates.io/crates/cargo-spellcheck)
- Run
cargo spellcheck README.md
Expected behavior
Please complete the following information:
- System: Fedora
- Obtained: git
Additional context
This is an issue with the tokenizer too, it splits on .
tokens. But we could whitelist valid urls in alt image names.
Note that this can most likely not be solved by a user dictionary, since it will never see the full url, but only subtokens.
it makes sense, thanks! So do you mean that the altext should not be corrected in case is followed by a link? or that the link should not be corrected?
This is what it looks in this branch: laysa-cmark-links-handled
error: spellcheck(Hunspell)
--> /home/tmhdev/Documents/cargo-spellcheck/README.md:2
|
2 | ![altext](filenamve)
| ^^^^^^
| - alt ext, alt-ext, textual, textural, external, or exalt
|
| Possible spelling mistake found.
error: spellcheck(Hunspell)
--> /home/tmhdev/Documents/cargo-spellcheck/README.md:3
|
3 | [![crates.io](https://img.shields.io/crates/v/cargo_spellcheck.svg)](https://crates.io/crates/cargo-spellcheck)
| ^^
| - oi, Io, ii, ion, bio, Rio, is, or one of 7 others
|
| Possible spelling mistake found.
I think the alt text should be checked if it is an url, if so, just accept it. If not, check the text like any other. I think that makes most sense. Usually alt text contains only a few key words or an url.
For the example,
[![crates.io](https://img.shields.io/crates/v/cargo_spellcheck. svg)] (https://crates.io/crates/cargo-spellcheck)
What should be checked in this case?highlighted in negrito.
- [![crates.io](https://img.shields.io/crates/v/cargo_spellcheck. svg)](https://crates.io/crates/cargo-spellcheck)
- [![crates.io](https://img.shields.io/crates/v/cargo_spellcheck. svg)] (https://crates.io/crates/cargo-spellcheck)
- [![crates.io](https://img.shields.io/crates/v/cargo_spellcheck.svg)] (https://crates.io/crates/cargo-spellcheck)
Ignore that I added some spaces just to avoid showing images or make links.
I would expect that crates.io
is checked if it is a valid url, and if so it is skipped to be spell checked.
Additionally, if requested by the user, both urls should be checked for existence https://img.shields.io/crates/v/cargo_spellcheck.svg https://crates.io/crates/cargo-spellcheck ).
Does that make sense?
To speed this up, we could borrow some code from https://github.com/deadlinks/cargo-deadlinks , which works ontop of html files. We could re-use the link check code, and apply it directly on the source, without having to deal with the html parsing. Just an idea though.