skandaloptagon / AIC_HW_1-Crawler

Feel free to use any open source crawler as the code base. Write your focused crawler, such as crawling only CoC website or crawling only healthcare web pages. You are expected to crawl at least 1000 pages.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AIC_HW_1-Crawler

Feel free to use any open source crawler as the code base. Write your focused crawler, such as crawling only CoC website or crawling only healthcare web pages. You are expected to crawl at least 1000 pages.

setup

just virtualenv venv then source venv/bin/activate then pip install -r requirements.txt

How to run

Just cd coc_crawler then Scrapy crawl coc

Results

Viewing results

in coc_crawler a file called items.csv will be created which contains all the page links and the reference link. Stats will appear in the terminal before, during, and after execution.

About

Feel free to use any open source crawler as the code base. Write your focused crawler, such as crawling only CoC website or crawling only healthcare web pages. You are expected to crawl at least 1000 pages.


Languages

Language:TeX 89.5%Language:Python 10.4%Language:Shell 0.0%