p371k9 / quotesjs

Scrapy vs. saved HTMLs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

quotesjs

Crawl and scrape data from HTML files. The htmls saved to pages/ directory/folder with the "foxdom" script. There are other alternatives for saving web pages, such as a FireFox extension called Save Page WE. HTML files are processed in the order they are named.

Target: http://quotes.toscrape.com/js/

Scrap page urls-s from the htmls:

scrapy crawl url -a dir=pages -o urls.lll

The .lll extension programmed for headless .csv

Scrap data from htmls:

scrapy crawl page -a dir=pages -o quotes.csv

About

Scrapy vs. saved HTMLs.


Languages

Language:HTML 90.4%Language:Python 9.6%