No3Mc / Goodreads-Dataflow-Book-Analytics

Goodreads Datasets

Home Page:https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Phase Test(pptx) prep

Weekly Journals: No3's

Week 1Week 2Week 3Week 4Week 5

1.0 INTRODUCTION
2.0 SCENARIO

A subset of Goodreads data has been selected for this coursework. There is data on books, reviews, authors, and genre. Permission was granted to use the data for this coursework from https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home on condition that the authors’ research papers are referenced (Mengting W. and Julian J. M 2018), (Mengting W et al. 2019). It is not necessary to read these papers to complete this coursework.

3.0 CONNECT, EXTRACT, TRANSFORM AND LOAD DATA (CETL) [15 marks]

4.0 CLEANING THE COLLECTIONS [20 marks]
5.0 QUERYING THE COLLECTIONS [20 marks]
6.0 IMPLEMENT AND EXPLAIN INDEX FOR THE DATABASE [15 marks]
7.0 RE-DESIGN THE DATABASE USING AGGREGATE DATA MODELLING [20 marks]
8.0 WEEKLY JOURNALS [10 marks]
9.0 DELIVERABLES
10.0 CLOSING COMMENTS