gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ignore_files no longer works with regex

jamespohalloran opened this issue · comments

After the 4.2.0 update, ignore_files no longer seems to be working with a regex (I was previously using --file-ignore)

Usgae

Running things currently on 4.3.0

arguments: --only_4xx --checks "Links,Images,Scripts,Favicon,OpenGraph" --ignore_status_codes "400, 403, 409, 429" --allow_hash_href --ignore_files "/.+\/(blog|404)\/.+/"

Result:

You can see it's still failing on a /out/blog/... file, which should be ignored: https://github.com/tinacms/tinacms.org/runs/7541078705?check_suite_focus=true#step:7:18

I was surprised to hear this broke since nothing changed around these options in 4.3.0. I even added a test case matching your regex and couldn't reproduce the error. Could you try replacing the underscores with dashes, eg --ignore-files instead of --ignore_files? And you're certain that ignore_files regex worked in < 4.3.0?

@jamespohalloran, the issue is with the sequence

--allow_hash_href --ignore_files "/.+\/(blog|404)\/.+/"

Since --allow-hash-href should be used with an actual value, e.g. --allow-hash-href=false or --allow-hash-href false, --ignore_files is interpreted as the value (ultimately interpreted as boolean true). This was indeed changed in 4.3.0 for many options via 565bdd9.

The following would fix the issue (options should be --separated, although they seem to work also with _):

--allow-hash-href=true --ignore-files "/.+\/(blog|404)\/.+/"

@gjtorikian, I just confirmed locally, and this type of issues would be captured if the option was defined with boolean type (any of TrueClass/FalseClass should do), something I had looked into in #735 (comment), e.g.

  p.option 'allow_hash_href', '--allow-hash-href=<true|false>',  'TrueClass', 'If `true`, assumes `href="#"` anchors are valid (default: `true`)'

With this, you get e.g.

bundle exec htmlproofer --allow-hash-href --ignore-files "/.+\/(blog|404)\/.+/"
# [...]
# mercenary/program.rb:33:in `go': invalid argument: --allow-hash-href --ignore-files (OptionParser::InvalidArgument)

confirming also that --ignore-files is used as value for --allow-hash-href.
This also does not require the explicit conversion from string values.

Argh, yeah, it might be time for me to change the CLI parser. It's been on my mind today (cc #742 (comment)), but I won't be able to put work into it until next week.

@gjtorikian, yes it might be a good idea as it currently looks tricky to define and maintain the arguments defining a robust CLI.

If this might be taking a bit of time and you think it is worth to quickly make sure issues like the one above are detected (which potentially affect anyone using some of changed options in 4.3.0), I am happy to help out with a small PR introducing the boolean type.
Otherwise of course just advising and making sure users are always including =true/false should be enough for the time being.

Since --allow-hash-href should be used with an actual value, e.g. --allow-hash-href=false or --allow-hash-href false, --ignore_files is interpreted as the value (ultimately interpreted as boolean true). This was indeed changed in 4.3.0 for many options via 565bdd9.

The following would fix the issue (options should be --separated, although they seem to work also with _):

Aha! That explains why I could never disable the check-internal-hash check in this issue: #725

Thank you! Appreciate all the help, and this project!

Closing this specific issue out but still searching for a Ruby CLI parser that's more intelligent.