Book Depository Scraper 📚

General Information

This is an automated data collection package (web-scraper) that is specifically tailored to scrape data on the Book depository website based on specific category keyword of choice. Check features of this scraper for details.

Installation

Use the package installer pip to install book scraper.

Install directly from github repository

!pip install git+https://github.com/fortune-uwha/book_scraper

Usage

The BooksScraper takes two arguments: number of examples to scrape and keyword to search. This returns a Pandas DataFrame with the records, with an option to export to a csv file.

To export raw data without cleaning:

from scraper.bookscraper import BooksScraper
scraper = BooksScraper(3000, "economics", True)
scraper.collect_information()

To export clean data:

from scraper.bookscraper import CleanBookScraper
scraper = CleanBookScraper(3000, "economics", True)
scraper.clean_dataframe()

For more information just type help(BooksScraper) or help(CleanBookScraper).

Extra Configuration

In order to use the Database class, you will need to create a postgreSQL database on Heroku or any other platform and enter the authentication credentials into config.py file.

Initialization

from database.database import Database
db = Database()

Example functions

These functions will be executed by running main.py. Feel free to edit the variables to suit your requirements.
- delete_tables() - Drops categories and books tables. Handle with care - this will destroy your data!
- create_tables() - Creates categories and books tables and sets up foreign keys.
- insert_data_into_db(dataframe, category) - Inserts the data from dataframe into a database.
- export_to_csv() - Fetches the data from the database and exports as .csv file.

Features

Based on specified category, BooksScraper collects information on:

Project Status

Project is: in progress

Acknowledgements

This project was based on Turing College learning on SQL and Data Scraping.

Contact

Created by @fortune_uwha - feel free to contact me!

License

This project is open source and available under the terms of the MIT license.

About

An automated Web-scraper specifically tailored to scrape data on Book depository

python webscraping

MIT License

Languages

Language:Python 100.0%

fortune-uwha / book-scraper