charlesdebarros/Ruby_Web_Crawler

Ruby_Web_Crawler

Ruby 2.7.X
Mechanize (gem) 2.8.X
- If the Mechanize gem is not installed, please install it by running gem install mechanize

In order to run the web crawler script:

Clone this repo
cd into the repo folder
run bundle install
create a urls.txt file (the crawler will look for this file name)
- From terminal, type touch lib/urls.txt
open the urls.txt file and type the URL you would like to crawler
- Example: https://github.com
- If just testing, do not type a URL like www.google.com as it will take time to parse all the URLs found.
- Save the file
run the script by typing ''ruby lib/crawler.rb''
Check the URLs parsed in the lib/ursl.txt file

Refactor script
- Break down crawl method.
- Make script create a file to store URLs if file does not exist
- Ask for user to input site to be crawled instead of having to edit the urls.txt file