sdbs-uni-p / sds-artifacts

Artifacts for the course Scaling Database Systems.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scaling Database Systems - Artifacts

Artifacts for the course Scaling Database Systems at the University of Passau.

This information refers to the Udacity course Intro to Hadoop and MapReduce and how to execute it in the Docker miniHive Container.

Example

These steps are necessary to run the example on Hadoop (in the miniHive Docker container):

# Clone repository in the miniHive Docker container
git clone https://github.com/sdbs-uni-p/sds-artifacts
cd sds-artifacts

# Unzip data
tar xzf purchases.tgz

# Hadoop preparation
hdfs dfs -mkdir -p /user/minihive
hdfs dfs -put purchases.txt

# Execute
cd 01-sales-per-store
mapred streaming -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input purchases.txt -output sales-per-store

Please note:

  • For Hadoop, mapper.py and reducer.py must be executable files (e.g. chmod +x mapper.py)
  • You have to complete the code of mapper.py and reducer.py

About

Artifacts for the course Scaling Database Systems.


Languages

Language:Python 100.0%