nwihardjo / COMP4651

HKUST COMP4651 Fall 18/19: Cloud Computing and Big Data Systems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COMP4651: Cloud Computing and Big Data Systems

Codes for HKUST COMP4651 Fall 2018: Cloud Computing and Big Data Systems

Assignment

Assignment 1: Benchmarking and measuring AWS EC2 CPU, Memory, and Network Performance across different types of instances and cluster locations

Assignment 2: Java implementation on copying files between HDFS and locals while maintaining the checksum

Assignment 3: MapReduce Programming on Java for Bigram count and frequency calculation based on Stripes and Pairs design pattern

Assignment 4: Apache Spark

  • Q1: Building a Word Count Application
  • Q2: Web Server Log Analysis

Assignmnet 5: Power Plant Machine Learning Pipeline Application with Apache Spark

Additional

DataFrame Live Programming: Spark's DataFrame Live Programming hands-on tutorial from Spark SF Meetup 2016

Spark Tutorial: Apache Spark tutorial heavily adapted from Spark MOOC

EMR Test: Test for Amazon EMR and S3 instances

About

HKUST COMP4651 Fall 18/19: Cloud Computing and Big Data Systems


Languages

Language:Jupyter Notebook 88.9%Language:Python 7.3%Language:Java 3.9%