twhipple / Web_Scraping_Books

First attempt at using Beautiful Soup and Selenium to web scrap a fun site.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Web Scraping Books

My kind of book store! Source: 'Megan Markham', unsplash.com

Intro

In this repo I plan to explore web scraping techniques in order to become more familiar with the coding libraries Beautiful Soup as well as Selenium. The website I plan to scrap was actually designed as a practice site and hopefully has some intentiionally beginner level concepts.

README Outline

  • Introduction
  • Readme Outline
  • Project Summary
  • Repo Contents
  • Libraries & Prerequisites
  • Conclusions
  • Future Work
  • Built With, Contributors, Authors, Acknowledgments

I can't imagine trying to find a book in here. Source: 'Janko Ferlic', unsplash.com

Project Summary

I found this project to be pretty challenging in the end. I spend a lot of time dealing with HTML tags and bs4.Element.tags which are pretty different than some of the other coding I have done. Though it certainly helped to be familiar with for loops, dictionaries, and pandas dataframes.

Repo Contents

This repo contains the following:

  • README.md - this is where you are now!
  • Web_Scraping_Books.ipynb - the Jupyter Notebook containing the finalized code for this project.
  • LICENSE - the required license information.
  • website url - "http://books.toscrape.com/index.html"
  • CONTRIBUTING.md
  • Images

Libraries & Prerequisites

These are the libraries that I used in this project.

  • numpy as np
  • pandas as pd
  • matplotlib.pyplot as plt
  • %matplotlib inline

Conclusions

I was able to scrap the site and pull together a list of books with titles, prices, and ratings.

Future Work

There is so much more I would like to do - and so many more websites to scrape!

This is what you get when you Google 'web-scraping'. Kinda nice really. Source: Vidar Nordli Mathisen, unsplash.com

Built With:

Jupyter Notebook Python 3.0 scikit.learn

Contributing

Please read CONTRIBUTING.md for details

Authors

Thomas Whipple

License

Please read LICENSE.md for details

Acknowledgments

Thanks to the website, "http://books.toscrape.com/index.html" and to Jeff Herman for helping me out.

About

First attempt at using Beautiful Soup and Selenium to web scrap a fun site.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 100.0%