askrht / costco-scrape

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Costco Scrape

This web scrape utilizes the BeautifulSoup and Selenium Webdriver libraries to fetch the following data from a Costco product page and load it into a CSV file:

  • SEO Meta Tags
  • Product Name
  • Product Description
  • Product Specifications
  • Category
  • Price
  • Embedded images

This script ONLY works for the Costco website. It will break for any other website.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for testing purposes.

Prerequisites:

  • Python 3.7.0, make sure in the installation directions to click "Default Path", and click the check button to install PIP as well

    Once Python 3.7 is installed:

    • Webdriver. Please install the Chrome version!

      (Copy the path of the installed webdriver! You will need it in set up!)
    • Selenium:

      pip install selenium

    • BeautifulSoup:

      pip install beautifulsoup4

Set Up:

  1. In the DriverPath.txt file, paste the path of the webdriver you installed above

    C:\Users\DAE\Downloads\Chromedriver

  2. If you installed a driver other than Chrome, open Scrape.py and do the following:

    On line 27, by default there is driver = webdriver.Chrome(path_to_driver)

    • For Firefox: driver = webdriver.Firefox(path_to_driver)
    • For Safari: driver = webdriver.Safari(path_to_driver)

Running:

For every iteration of scraping:

  1. In the URLS.txt file, delete all the current urls there

  2. Paste 10 new links, each on its own line, without quotation lines

  3. On the command line, go to the directory of the github repository by running:

    cd /d C:\Users\DAE\Documents\CostcoScrape\costco-scrape-master

  4. On the command line, start the script by running:

    python scrape.py

  5. That should run without any errors! In case there are, there could be something wrong with steps 2-3.

  6. Open the OutputData.csv file and voila, all the data from the above 10 links is loaded!

  7. Congratulations!

Authors:

  • CHUDDY

About


Languages

Language:Python 100.0%