rfc3986_parser.rb:67:in `split': bad URI(is not URI?)
david-strejc opened this issue · comments
David Strejc commented
I got following error:
Traceback (most recent call last):
10: from /usr/bin/whatweb:981:in `block (2 levels) in <main>'
9: from /usr/bin/whatweb:981:in `loop'
8: from /usr/bin/whatweb:998:in `block (3 levels) in <main>'
7: from /usr/share/whatweb/lib/target.rb:237:in `get_redirection_target'
6: from /usr/lib/ruby/2.5.0/uri/common.rb:275:in `join'
5: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
4: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
3: from /usr/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
2: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
1: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
/usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?): http://www.reznictvi-chochola.cz/uvod.html (URI::InvalidURIError)
Andrew Horton commented
There is an invalid meta tag in http://www.reznictvi-chochola.cz/index.html
<meta http-equiv="refresh" content="5;url= http://www.reznictvi-chochola.cz/uvod.html">
Note the space between url= and http. This is why the rfc3986 parser is raising an error.
./whatweb http://www.reznictvi-chochola.cz/index.html
#<Thread:0x0000562e0cab6220@./whatweb:979 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
10: from ./whatweb:981:in `block (2 levels) in <main>'
9: from ./whatweb:981:in `loop'
8: from ./whatweb:998:in `block (3 levels) in <main>'
7: from /home/urban/projects/WhatWeb/lib/target.rb:237:in `get_redirection_target'
6: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/common.rb:275:in `join'
5: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
4: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
3: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
2: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
1: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
/home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?): http://www.reznictvi-chochola.cz/uvod.html (URI::InvalidURIError)
http://www.reznictvi-chochola.cz/index.html [200 OK] Country[CZECH REPUBLIC][CZ], HTTPServer[Microsoft-IIS/6.0], IP[81.2.194.166], Meta-Author[Kamila Kostřicová], Meta-Refresh-Redirect[ http://www.reznictvi-chochola.cz/uvod.html], Microsoft-IIS[6.0], Title[Řeznictví a uzenářství Josef Chochola], X-Powered-By[ASP.NET]
Traceback (most recent call last):
10: from ./whatweb:981:in `block (2 levels) in <main>'
9: from ./whatweb:981:in `loop'
8: from ./whatweb:998:in `block (3 levels) in <main>'
7: from /home/urban/projects/WhatWeb/lib/target.rb:237:in `get_redirection_target'
6: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/common.rb:275:in `join'
5: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
4: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
3: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
2: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
1: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
/home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?): http://www.reznictvi-chochola.cz/uvod.html (URI::InvalidURIError)
bcoles commented
Stripping leading and trailing whitespace from the redirection URL would resolve this instance, and is probably a safe assumption.
if @@meta_refresh_regex =~ @body
metarefresh = @body.scan(@@meta_refresh_regex).flatten.first
- metarefresh = decode_html_entities(metarefresh)
+ metarefresh = decode_html_entities(metarefresh).strip
newtarget_m = URI.join(@target, metarefresh).to_s # this works for relative and absolute
end
irb(main):001:0> require 'uri'
=> false
irb(main):002:0> URI.join('https://example.com/', 'http://www.reznictvi-chochola.cz/uvod.html')
=> #<URI::HTTP http://www.reznictvi-chochola.cz/uvod.html>
irb(main):003:0> URI.join('https://example.com/', ' http://www.reznictvi-chochola.cz/uvod.html')
Traceback (most recent call last):
9: from /usr/bin/irb:11:in `<main>'
8: from (irb):3
7: from /usr/lib/ruby/2.5.0/uri/common.rb:275:in `join'
6: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
5: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
4: from /usr/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
3: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
2: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
1: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split'
URI::InvalidURIError (bad URI(is not URI?): http://www.reznictvi-chochola.cz/uvod.html)
irb(main):004:0> URI.join('https://example.com/', ' http://www.reznictvi-chochola.cz/uvod.html'.strip)
=> #<URI::HTTP http://www.reznictvi-chochola.cz/uvod.html>
irb(main):005:0>