cristianaggewerc / big-data-processing

Projects focused on big data processing, including the latest big data technologies (Spark), and NoSQL database (MongoDB). Developed with Pyspark.

Repository from Github https://github.comcristianaggewerc/big-data-processingRepository from Github https://github.comcristianaggewerc/big-data-processing

Data Processing for Big Data

Imgur Those projects were developed over the course of Monash MDS. This unit focuses on big data processing, including latest big data technologies (Spark), and NoSQL database (MongoDB). The data processing covers data frames, and various advanced data analytics for big data. Programming exercises and assignments use Spark, MongoDB, Data Frames, and ML Lib.

Two of the projects developed are:

  1. Big Data Processing, analysis and visualisation: Pyspark RDD based analysis of structured and unstructured data.

  2. Comparison of Machine Learning Algorithms on Big Data: Pyspark Dataframe based analysis and implementation of 4 algorithms using Spark MLib and MongoDB.

About

Projects focused on big data processing, including the latest big data technologies (Spark), and NoSQL database (MongoDB). Developed with Pyspark.


Languages

Language:Jupyter Notebook 100.0%