JuJu2181 / Amazon-Laptop-Data-Scraping-And-Visualization

First group project for CodeRush Data Engineering Apprenticeship Program

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CodeRush Data Engineering Apprenticeship Program

Group Project 1 ~ Amazon Web Scraping And Visualization

Team Members (Group D)

  • Anish Shilpakar
  • Shyamron Dongol
  • Shivaji Pandit Chhetri
  • Amit Duwal

Tasks Completed:

  • Use Scrapy to collect data from amazon
  • Create a plot of price vs producsts (group by)
  • Prepare a report on the findings (Patterns)
  • Prepare Project Documentation

Tools and Technologies Used

  • Python
  • Scrapy
  • Pandas
  • Matplotlib
  • Seaborn

Data Description

  • data.csv > Original raw data
  • laptop_data_cleaned.csv > Cleaned data
  • new_data_collected.csv > New extra data collected for some analysis

Files Description

  • AmazonScraper is the Scrapy project for scraping data from Amazon.com.
  • amazon_scraping.py is the main spider file here which contains code for scraping laptop data from Amazon.com.
    • File Location: /AmazonScraper/AmazonScraper/spiders/amazon_scraping.py
  • visualization.ipynb is the Jupyter notebook containing code for data cleaining, processing and visualization
    • File Location: /visualization.ipynb

Documentation Links

The necessary links for documentation are attached alongisde here:

About

First group project for CodeRush Data Engineering Apprenticeship Program


Languages

Language:Jupyter Notebook 99.0%Language:Python 1.0%