dimitrismistriotis / goodreads_categories_scrapping

First web scraping experiment (shell and Python)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README

Wanted to investigate Goodreads' categories numbers and play a little bit with Python's html parsing libraries (Beautiful soup in this case)

To download book categories html from Goodreads:

./download_script

Then to retrieve data and popuate a CSV with these data:

./assemble_csv

or to do both:

./download_script && ./assemble_csv

Folders

examples: In the examples folder diagrams with most and least popular categories (after placing generated CSV to Google Doc's spreadsheet.

list_html: Downloaded files. Commited folder's content in case anyone wants to experiment without retrieving data.

Notes

Did not explore Goodreads API as was more interested in experimenting with web scraping.

About

First web scraping experiment (shell and Python)


Languages

Language:Python 86.1%Language:Shell 13.9%