Lorkbong (named after the staff carried by Sun Wukong) is a very very simple example Heroku app that lets you trigger showing job status or launching a new job, either by visiting a special URL or by triggering a rake task.
- Create the app:
- Edit the obvious files in
config/
. You can make life much more dangerous but slightly simpler by adding your keys to the config. If you do this, ake sure you DON’T check this app into a public repo or we’ll switch all our infrastructure to run off your account.
- You should probably be more responsible and use environment variables for your credentials. Use the heroku commandline tool to run the following command:
- Now visit http://lorkbong-example.heroku.com (or whatever you called it). Follow the link for ‘list jobs’. You should see a listing of any jobs in your queue.
- Right now the job info is hardcoded into the file
app/helpers/emr_script.rb
(sorry). Open that file and edit the block so that it runs (for the first time) with ‘alive’ set to true.
# # You need to edit the following things: # EMR_OPTS = { # # Path to the runner. :emr_runner => "#{::ROOT_DIR}/vendor/elastic-mapreduce/elastic-mapreduce", # # Temp storage for the keypair file (elastic-mapreduce script demands it be a static file). :keypair_file => ::ROOT_DIR+'/tmp/emr_keypair.pem', # # If you're debugging: # # first run with alive set to true, and launch the job. :alive => true, # # After the job has been created and run for the first time, fill your # # jobflow into the following and set alive back to nil. # :jobflow => "j-18OUFBXJ0Z01W", } # Path to the input files. Note the 's3n' prefix. EMR_INPUT = "s3n://emr.yourdomain.com/wukong/data/examples/links-simple-sorted-10k.txt" # Path to the output files. This directory must not exist. Note the 's3n' prefix. EMR_OUTPUT = "s3n://emr.yourdomain.com/wukong/data/examples/wp-link-degree-4"
- Make note of the jobflow ID (or check the list jobs path at /emr/list ) and hack that into the file above for debugging.
- Check the AWS console for a closer look at the job progress. You can also find the public IP of the master node from the console, and log in to the machine directly:
ssh -i /path/to/your/keypair.pem hadoop@ec2-148-37-14-128.compute-1.amazonaws.com
- The frontend app is based on Monk / Cartilage, a skeleton for building Sinatra apps.