aseidlitz / mongodb-hadoop-workshop

MongoDB-Hadoop Workshop Exercises

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MongoDB-Hadoop Workshop Exercises

MongoDB powers applications as an operational database and Hadoop delivers intelligence as with powerful analytical infrastructure. In this workshop we'll start by learning about how these technologies fit together with the MongoDB Connector for Hadoop. Then we'll cover reading/writing MongoDB data using MapReduce, Pig, Hive, and Spark. Finally, we'll discuss the broader data ecosystem and operational considerations.

Data

Prior to running any of the exercises, load the sample dataset into MongoDB.

Finally, load the dataset:

$ python dataset/movielens.py [/path/to/movies.dat] [/path/to/ratings.dat]

For more information refer to the dataset README.

Exercises

Refer to the individual READMEs for steps on building and deploying each exercise.

About

MongoDB-Hadoop Workshop Exercises

License:Apache License 2.0


Languages

Language:Java 74.3%Language:Python 20.4%Language:PigLatin 5.3%