urbanadventurer / WhatWeb

Next generation web scanner

Home Page:https://www.morningstarsecurity.com/research/whatweb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

rfc3986_parser.rb:67:in `split': bad URI(is not URI?)

david-strejc opened this issue · comments

I got following error:

Traceback (most recent call last):
        10: from /usr/bin/whatweb:981:in `block (2 levels) in <main>'
         9: from /usr/bin/whatweb:981:in `loop'
         8: from /usr/bin/whatweb:998:in `block (3 levels) in <main>'
         7: from /usr/share/whatweb/lib/target.rb:237:in `get_redirection_target'
         6: from /usr/lib/ruby/2.5.0/uri/common.rb:275:in `join'
         5: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
         4: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
         3: from /usr/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
         2: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
         1: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
/usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?):  http://www.reznictvi-chochola.cz/uvod.html (URI::InvalidURIError)     

There is an invalid meta tag in http://www.reznictvi-chochola.cz/index.html
<meta http-equiv="refresh" content="5;url= http://www.reznictvi-chochola.cz/uvod.html">

Note the space between url= and http. This is why the rfc3986 parser is raising an error.

 ./whatweb http://www.reznictvi-chochola.cz/index.html 
#<Thread:0x0000562e0cab6220@./whatweb:979 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	10: from ./whatweb:981:in `block (2 levels) in <main>'
	 9: from ./whatweb:981:in `loop'
	 8: from ./whatweb:998:in `block (3 levels) in <main>'
	 7: from /home/urban/projects/WhatWeb/lib/target.rb:237:in `get_redirection_target'
	 6: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/common.rb:275:in `join'
	 5: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
	 4: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
	 3: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
	 2: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
	 1: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
/home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?):  http://www.reznictvi-chochola.cz/uvod.html (URI::InvalidURIError)
http://www.reznictvi-chochola.cz/index.html [200 OK] Country[CZECH REPUBLIC][CZ], HTTPServer[Microsoft-IIS/6.0], IP[81.2.194.166], Meta-Author[Kamila Kostřicová], Meta-Refresh-Redirect[ http://www.reznictvi-chochola.cz/uvod.html], Microsoft-IIS[6.0], Title[Řeznictví a uzenářství Josef Chochola], X-Powered-By[ASP.NET]
Traceback (most recent call last):
	10: from ./whatweb:981:in `block (2 levels) in <main>'
	 9: from ./whatweb:981:in `loop'
	 8: from ./whatweb:998:in `block (3 levels) in <main>'
	 7: from /home/urban/projects/WhatWeb/lib/target.rb:237:in `get_redirection_target'
	 6: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/common.rb:275:in `join'
	 5: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
	 4: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
	 3: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
	 2: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
	 1: from /home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
/home/urban/.rbenv/versions/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split': bad URI(is not URI?):  http://www.reznictvi-chochola.cz/uvod.html (URI::InvalidURIError)

Stripping leading and trailing whitespace from the redirection URL would resolve this instance, and is probably a safe assumption.

    if @@meta_refresh_regex =~ @body
      metarefresh = @body.scan(@@meta_refresh_regex).flatten.first
-      metarefresh = decode_html_entities(metarefresh)
+      metarefresh = decode_html_entities(metarefresh).strip
      newtarget_m = URI.join(@target, metarefresh).to_s # this works for relative and absolute
    end
irb(main):001:0> require 'uri'
=> false
irb(main):002:0> URI.join('https://example.com/', 'http://www.reznictvi-chochola.cz/uvod.html')
=> #<URI::HTTP http://www.reznictvi-chochola.cz/uvod.html>
irb(main):003:0> URI.join('https://example.com/', ' http://www.reznictvi-chochola.cz/uvod.html')
Traceback (most recent call last):
        9: from /usr/bin/irb:11:in `<main>'
        8: from (irb):3
        7: from /usr/lib/ruby/2.5.0/uri/common.rb:275:in `join'
        6: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `join'
        5: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:89:in `inject'
        4: from /usr/lib/ruby/2.5.0/uri/generic.rb:1101:in `merge'
        3: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:117:in `convert_to_uri'
        2: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
        1: from /usr/lib/ruby/2.5.0/uri/rfc3986_parser.rb:67:in `split'
URI::InvalidURIError (bad URI(is not URI?):  http://www.reznictvi-chochola.cz/uvod.html)
irb(main):004:0> URI.join('https://example.com/', ' http://www.reznictvi-chochola.cz/uvod.html'.strip)
=> #<URI::HTTP http://www.reznictvi-chochola.cz/uvod.html>
irb(main):005:0>