fortune-uwha / book-scraper

An automated Web-scraper specifically tailored to scrape data on Book depository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Book Depository Scraper 📚

Python 3.6 Python 3.7 Python 3.8 Code style: black

Table of Contents

General Information

This is an automated data collection package (web-scraper) that is specifically tailored to scrape data on the Book depository website based on specific category keyword of choice. Check features of this scraper for details.

Installation

Use the package installer pip to install book scraper.

  • Install directly from github repository
!pip install git+https://github.com/fortune-uwha/book_scraper

Usage

The BooksScraper takes two arguments: number of examples to scrape and keyword to search. This returns a Pandas DataFrame with the records, with an option to export to a csv file.

  • To export raw data without cleaning:
from scraper.bookscraper import BooksScraper
scraper = BooksScraper(3000, "economics", True)
scraper.collect_information()
  • To export clean data:
from scraper.bookscraper import CleanBookScraper
scraper = CleanBookScraper(3000, "economics", True)
scraper.clean_dataframe()

For more information just type help(BooksScraper) or help(CleanBookScraper).

Extra Configuration

In order to use the Database class, you will need to create a postgreSQL database on Heroku or any other platform and enter the authentication credentials into config.py file.

  • Initialization
from database.database import Database
db = Database()

Example functions

  • These functions will be executed by running main.py. Feel free to edit the variables to suit your requirements.
    • delete_tables() - Drops categories and books tables. Handle with care - this will destroy your data!
    • create_tables() - Creates categories and books tables and sets up foreign keys.
    • insert_data_into_db(dataframe, category) - Inserts the data from dataframe into a database.
    • export_to_csv() - Fetches the data from the database and exports as .csv file.

Features

Based on specified category, BooksScraper collects information on:

  • Book title
  • Book author
  • Book price
  • Book edition
  • Book publish date
  • Book category
  • Book item url
  • Book image url

Project Status

Project is: in progress

Acknowledgements

This project was based on Turing College learning on SQL and Data Scraping.

Contact

Created by @fortune_uwha - feel free to contact me!

License

This project is open source and available under the terms of the MIT license.

About

An automated Web-scraper specifically tailored to scrape data on Book depository

License:MIT License


Languages

Language:Python 100.0%