charlesdebarros / Ruby_Web_Crawler

Web Crawler build using Ruby

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ruby_Web_Crawler

Technologies used

  • Ruby 2.7.X

  • Mechanize (gem) 2.8.X

    • If the Mechanize gem is not installed, please install it by running gem install mechanize

In order to run the web crawler script:

  • Clone this repo
  • cd into the repo folder
  • run bundle install
  • create a urls.txt file (the crawler will look for this file name)
    • From terminal, type touch lib/urls.txt
  • open the urls.txt file and type the URL you would like to crawler
    • Example: https://github.com
    • If just testing, do not type a URL like www.google.com as it will take time to parse all the URLs found.
    • Save the file
  • run the script by typing ''ruby lib/crawler.rb''
  • Check the URLs parsed in the lib/ursl.txt file

TODO

  • Refactor script
    • Break down crawl method.
    • Make script create a file to store URLs if file does not exist
    • Ask for user to input site to be crawled instead of having to edit the urls.txt file

About

Web Crawler build using Ruby


Languages

Language:JavaScript 90.8%Language:HTML 6.8%Language:CSS 2.1%Language:Ruby 0.3%