gjtorikian / html-proofer

Test your rendered HTML files to make sure they're accurate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"ERROR: Invalid predicate" on ugly Maven search URL

nchammas opened this issue · comments

On this page is the following link:

For an up-to-date list, please refer to the Maven repository for the full list of supported sources and artifacts.

Checking this with a locally built version of that page yields the following error:

`evaluate': ERROR: Invalid predicate: 
  //*[@name="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]
  |/*[@name="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""]
  |//*[@id="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]
  |//*[@id="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""] (Nokogiri::XML::XPath::SyntaxError)

I assume this should be handled more gracefully somehow since the link does appear to be valid HTML and works for me in Safari.

For reference, the full trace is:

bundler: failed to load command: htmlproofer (.../spark/docs/.local_ruby_bundle/ruby/3.3.0/bin/htmlproofer)
.../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:238:in `evaluate': ERROR: Invalid predicate: //*[@name="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]|/*[@name="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""]|//*[@id="search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%224.0.0%22"]|//*[@id="search|ga|1|g:"org.apache.spark" AND v:"4.0.0""] (Nokogiri::XML::XPath::SyntaxError)
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:238:in `xpath_impl'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:219:in `xpath_internal'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/nokogiri-1.16.0-arm64-darwin/lib/nokogiri/xml/searchable.rb:182:in `xpath'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:159:in `check_hash_in_2xx_response'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:93:in `response_handler'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:78:in `block in queue_request'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/callbacks.rb:146:in `block in execute_callbacks'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/callbacks.rb:145:in `each'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/callbacks.rb:145:in `execute_callbacks'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/request/operations.rb:35:in `finish'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/easy_factory.rb:170:in `block in set_callback'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/easy/response_callbacks.rb:74:in `block in complete'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/easy/response_callbacks.rb:74:in `each'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/easy/response_callbacks.rb:74:in `complete'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/multi/operations.rb:189:in `check'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/multi/operations.rb:202:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/ethon-0.16.0/lib/ethon/multi/operations.rb:50:in `perform'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/hydra/runnable.rb:15:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/typhoeus-1.4.1/lib/typhoeus/hydra/memoizable.rb:51:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:69:in `run_external_link_checker'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/url_validator/external.rb:31:in `validate'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/runner.rb:146:in `validate_external_urls'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/runner.rb:97:in `check_files'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/runner.rb:50:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/lib/html_proofer/cli.rb:22:in `run'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/exe/htmlproofer:14:in `block in <top (required)>'
        from .../.rbenv/versions/3.3.0/lib/ruby/3.3.0/benchmark.rb:313:in `realtime'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/gems/html-proofer-5.0.8/exe/htmlproofer:14:in `<top (required)>'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/bin/htmlproofer:25:in `load'
        from .../spark/docs/.local_ruby_bundle/ruby/3.3.0/bin/htmlproofer:25:in `<top (required)>'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli/exec.rb:58:in `load'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli/exec.rb:58:in `kernel_load'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli/exec.rb:23:in `run'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli.rb:492:in `exec'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli.rb:34:in `dispatch'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/cli.rb:28:in `start'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/exe/bundle:37:in `block in <top (required)>'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
        from .../.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/bundler-2.4.22/exe/bundle:29:in `<top (required)>'
        from .../.rbenv/versions/3.3.0/bin/bundle:25:in `load'
        from .../.rbenv/versions/3.3.0/bin/bundle:25:in `<main>'

@nchammas, I could reproduce the error (with HTMLProofer 5.0.8) as follows:

htmlproofer --as-links "https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%223.5.0%22"

Not sure HTMLProofer can do much to make this special full URL checkable, since the error is in fact coming from nokogiri.

Still, HTMLProofer would allow you to ignore the parts of the URL that are causing issues with --swap-urls, e.g.

htmlproofer  --swap-urls "search\.maven\.org/#search.*:search.maven.org" --as-links "https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%223.5.0%22"

You find full documentation for this in the README.

Hope this helps

To be precise, the error is caused by HTMLProofer trying to use nokogiri to infer whether #search%7Cga%7C1%7Cg%3A%22org.apache.spark%22%20AND%20v%3A%223.5.0%22 is a valid hash for https://search.maven.org/.

However, this is not meant to be a content hash but used to provide some query parameters processed via JavaScript, and this is not something HTMLProofer/Nokogiri can do much about.

Therefore, you would need to instruct HTMLProofer that this is not a hash to be checked, which is indeed what --swap-urls above would do by stripping #search...

Closing this issue