There are 1 repository under gcp-dataproc topic.
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
Monte Carlo stock simulation using Apache Spark.
Project for Cloud Computing course (A.Y. 2018/2019)
Implements a work queue for Dataproc Worflow Template executions
Apache spark sandbox on GCP and Amazon EMR.
First project for Big Data course held at Roma Tre University
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.
Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.
Hadoop Google DataProc DIO study
Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.