1N3 / BlackWidow

Line 65 in b3af99c

if link.get('href')[:4] == "http":

To get inside of this if statement, link.get('href') must start with http but then it cannot ever start with # or tel: or :mailto.

Since link.get('href') is being used so frequently in this function, a better option would be to assign it to a variable like current_link = link.get('href') and also use Python's str.startswith instead of slices

>>> link = {'href': 'http://www.google.com'}
>>> current_link = link.get('href')
>>> current_link[:4] == 'http'
True
>>> current_link.startswith('http')
True
>>> current_link.startswith(('#', 'tel:', 'mailto:'))
False

I commented out these two seemingly unreachable sections to highlight them.

if link.get('href')[:4] == "http":
  # SAME ORIGIN
  if domain in link.get('href'):
    # IF URL IS DYNAMIC
    if "?" in link.get('href'):
      print OKRED + "[+] Dynamic URL found! " + link.get('href') + " " + RESET
      urls.write(link.get('href') + "\n")
      urls_saved.write(link.get('href') + "\n")
      dynamic_saved.write(link.get('href') + "\n")
    # # DOM BASED LINK
    # elif link.get('href')[:1] == "#":
    #   print OKBLUE + "[i] DOM based link found! " + link.get('href') + " " + RESET
    # # TELEPHONE
    # elif link.get('href')[:4] == "tel:":
    #   s = link.get('href')
    #   phonenum = s.split(':')[1]
    #   print OKORANGE + "[i] Telephone # found! " + phonenum + " " + RESET
    #   phones_saved.write(phonenum + "\n")
    # # EMAIL
    # elif link.get('href')[:7] == "mailto:":
    #   s = link.get('href')
    #   email = s.split(':')[1]
    #   print OKORANGE + "[i] Email found! " + email + " " + RESET
    #   emails_saved.write(email + "\n")

    # FULL URI OF SAME ORIGIN
    else:
      print link.get('href')
      urls.write(link.get('href') + "\n")
      urls_saved.write(link.get('href') + "\n")
  # EXTERNAL LINK FOUND
  else:
    # IF URL IS DYNAMIC
    if "?" in link.get('href'):
      print COLOR2 + "[+] External Dynamic URL found! " + link.get('href') + " " + RESET
    # # DOM BASED LINK
    # elif link.get('href')[:1] == "#":
    #   print COLOR2 + "[i] External DOM based link found! " + link.get('href') + " " + RESET
    # # TELEPHONE
    # elif link.get('href')[:4] == "tel:":
    #   s = link.get('href')
    #   phonenum = s.split(':')[1]
    #   print OKORANGE + "[i] External Telephone # found! " + phonenum + " " + RESET
    # # EMAIL
    # elif link.get('href')[:7] == "mailto:":
    #   s = link.get('href')
    #   email = s.split(':')[1]
    #   print OKORANGE + "[i] External Email found! " + email + " " + RESET

    # FULL URI OF EXTERNAL ORIGIN
    else:
      print COLOR2 + "[i] External link found! " + link.get('href') + " " + RESET

Thanks for the heads up. I removed the affected lines from the master branch, so should be good now.

@1N3 ,

No problem, I just wasn't sure which way you wanted to go with it (just deleting it or re-working them in some other way).

I've also been experimenting with using a scrapy spider to get the links instead of how it is currently being done.

@delirious-lettuce, cool. If there's a better way to do it, I'd be interested to see it.

Unreachable code