Saurabhkhandebharad / BigData-SK

Analyzed a multicategory e-commerce store using big data techniques on a Kaggle dataset with the help of AWS EC2, AWS S3, PySpark, AWS Glue ETL, AWS Athena, AWS CloudFormation, AWS Lambda and Power BI!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BigData - Saurabh Khandebharad

E - Commerce Analytics and Big Data Processing (End-To-End Group Project)

Guidance by - Pradeep Tripathi

KAGGLE DATASET: https://www.kaggle.com/datasets/mkechinov/ecommerce-behavior-data-from-multi-category-store

File Name: 2019-Nov.csv

File Size: 8 GB

Project Architecture

Architecture

Being excellent at data analysis and visualization, I volunteered to do the data cleaning and preprocessing in pyspark. Head over to PySpark.py and check my code! Handling such a large data was fun and a learning experience!

👉My PySpark Script

PowerBI Visualizations..

Page 1 - Dashboard Page1

Page 2 - Dashboard Page2



Don't forget to leave a star!⭐:

About

Analyzed a multicategory e-commerce store using big data techniques on a Kaggle dataset with the help of AWS EC2, AWS S3, PySpark, AWS Glue ETL, AWS Athena, AWS CloudFormation, AWS Lambda and Power BI!


Languages

Language:Python 100.0%