dBenf / Big-Data-Engineering

Intra-course Homeworks and final homework for Big Data Engineering course. Include KPMG Hackaton 'University Trends' documentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big-Data-Engineering

This repository contains the homeworks and other staff regards to the Big Data Engineering course (AY 21/22) at the University of Naples Federico II.

Homeworks

All the homeworks have been developed in team of 2.

  • Homework1-MongoDB: design and development of a NoSQL database using MongoDB Compass for the storing of Yelp Dataset collections.
  • Homework2-ApachePig: processing of Yelp Dataset Reviews collection using Apache Pig, with Pig Latin language.
  • Homework3-ApacheSpark: distributed processing using Spark (PySpark) with support of Google Colab for the analysis of Yelp Dataset collections.
  • HomeworkFinale-KPMG: data analysis of MIUR and ISTAT open dataset on university students enrolling using Python for the pre-processing of the data, MongoDB for the storage, and Apache Spark for the analysis.

KPMG Hackaton

  • KPMG-UniversityTrends: a Python elaboration of MIUR and ISTAT open dataset, and an analysis of university trends with Pandas DataFrame, with development of some dashboards in Microsoft Power BI.

About

Intra-course Homeworks and final homework for Big Data Engineering course. Include KPMG Hackaton 'University Trends' documentation


Languages

Language:Jupyter Notebook 100.0%