jeanineharb / Big-Data-Analysis-with-Scala-and-Spark

My submissions for the Coursera MOOC "Big Data Analysis with Scala and Spark" given by EPFL.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big Data Analysis with Scala and Spark

This project contains my submissions for the Coursera MOOC Big Data Analysis with Scala and Spark, given by EPFL, and taught by Prof. Heather C. Miller.

Timeline

  • Date Started: March 14, 2017

  • Date Completed: March 24, 2017

Assignments

Programming Assignment 1: Wikipedia

  • Week: 1

  • Lesson: Basics of Spark's RDDs

  • Description: "In this assignment, we'll use our full-text data from Wikipedia to produce a rudimentary metric of how popular a programming language is, in an effort to see if our Wikipedia-based rankings bear any relation to the popular Red Monk rankings."

  • Grade: 10 / 10

Programming Assignment 2: StackOverflow

  • Week: 2 (two-week long assignment)

  • Lesson: Reduction Operations & Distributed Key-Value Pairs

  • Description: "The overall goal of this assignment is to implement a distributed k-means algorithm which clusters posts on the popular question-answer platform StackOverflow according to their score. Moreover, this clustering should be executed in parallel for different programming languages, and the results should be compared."

  • Grade: 10 / 10

Programming Assignment 3: Time Usage

  • Week: 4

  • Lesson: SQL, Dataframes, and Datasets

  • Description: "Our goal is to identify three groups of activities: primary needs (sleeping and eating), work, other (leisure). And then to observe how do people allocate their time between these three kinds of activities, and if we can see differences between men and women, employed and unemployed people, and young (less than 22 years old), active (between 22 and 55 years old) and elder people."

  • Grade: 10 / 10

Note

Resource files have to be unzipped for the code to work.

About

My submissions for the Coursera MOOC "Big Data Analysis with Scala and Spark" given by EPFL.


Languages

Language:Scala 100.0%