README

Wanted to investigate Goodreads' categories numbers and play a little bit with Python's html parsing libraries (Beautiful soup in this case)

To download book categories html from Goodreads:

./download_script

Then to retrieve data and popuate a CSV with these data:

./assemble_csv

or to do both:

./download_script && ./assemble_csv

Folders

examples: In the examples folder diagrams with most and least popular categories (after placing generated CSV to Google Doc's spreadsheet.

list_html: Downloaded files. Commited folder's content in case anyone wants to experiment without retrieving data.

Did not explore Goodreads API as was more interested in experimenting with web scraping.

First web scraping experiment (shell and Python)

Language:Python 86.1%Language:Shell 13.9%