Heisenberghj7 / Retail-Store-BigData

πŸ“Š πŸ“‘This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ“‰ πŸ§‘β€πŸ’» Retail Store BigData πŸ“ŠπŸ“¦

Project Architecture πŸ“ πŸ–ŠοΈ

Part I: Data Migration & Data Analysis

Importing a Table from MySQL to HDFS:

  1. Create the database and the tables in MySQL.
  2. Use Sqoop to import the tables in the retail store database and save it in HDFS under "/user".
  3. Import the tables to a Parquet data format rather than the default file form (text file).

Data Analysis: First of all we're going to import data from HDFS to Hive, HiveQL is Hive’s query language, a dialect of SQL for big data. By using HiveQL we're going to determine:

  • Get How many Orders were placed
  • Get Average Revenue Per Order
  • Get Average Revenue Per Day Per Product

part ll : PowerBI

  • (In Progress)

Part lll : Spark SQL and PySpark

  • (In Progress)

  • πŸ“« Feel free to contact me if anything is wrong or if anything needs to be changed 😎! medhajjari9@gmail.com

Open In Colab

About

πŸ“Š πŸ“‘This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..


Languages

Language:Python 100.0%