mattbornski / Hive-Demo

Following along with the Hive tutorial at StrataConf / HadoopWorld

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

<!DOCTYPE html>  
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <title>README</title>
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <style>
/* 
   This document has been created with Marked.app <http://markedapp.com>, Copyright 2011 Brett Terpstra
   Please leave this notice in place, along with any additional credits below.
   ---------------------------------------------------------------
   Title: GitHub
   Author: Brett Terpstra
   Description: Github README style. Includes theme for Pygmentized code blocks.
*/
html,body{color:black}*{margin:0;padding:0}body{font:13.34px helvetica,arial,freesans,clean,sans-serif;-webkit-font-smoothing:antialiased;line-height:1.4;padding:3px;background:#fff;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px}p{margin:1em 0}a{color:#4183c4;text-decoration:none}#wrapper{background-color:#fff;border:3px solid #eee!important;padding:0 30px;margin:15px}#wrapper{font-size:14px;line-height:1.6}#wrapper>*:first-child{margin-top:0!important}#wrapper>*:last-child{margin-bottom:0!important}h1,h2,h3,h4,h5,h6{margin:0;padding:0}h1{margin:15px 0;padding-bottom:2px;font-size:24px;border-bottom:1px solid #eee}h2{margin:20px 0 10px 0;font-size:18px}h3{margin:20px 0 10px 0;padding-bottom:2px;font-size:14px;border-bottom:1px solid #ddd}h4{font-size:14px;line-height:26px;padding:18px 0 4px;font-weight:bold;text-transform:uppercase}h5{font-size:13px;line-height:26px;padding:14px 0 0;font-weight:bold;text-transform:uppercase}h6{color:#666;font-size:14px;line-height:26px;padding:18px 0 0;font-weight:normal;font-variant:italic}hr{background:transparent url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAYAAAAECAYAAACtBE5DAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OENDRjNBN0E2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OENDRjNBN0I2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo4Q0NGM0E3ODY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo4Q0NGM0E3OTY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqqezsUAAAAfSURBVHjaYmRABcYwBiM2QSA4y4hNEKYDQxAEAAIMAHNGAzhkPOlYAAAAAElFTkSuQmCC) repeat-x 0 0;border:0 none;color:#ccc;height:4px;margin:20px 0;padding:0}#wrapper>h2:first-child,#wrapper>h1:first-child,#wrapper>h1:first-child+h2{border:0;margin:0;padding:0}#wrapper>h3:first-child,#wrapper>h4:first-child,#wrapper>h5:first-child,#wrapper>h6:first-child{margin:0;padding:0}h4+p,h5+p,h6+p{margin-top:0}li p.first{display:inline-block}ul,ol{margin:15px 0 15px 25px}ul li,ol li{margin-top:7px;margin-bottom:7px}ul li>*:last-child,ol li>*:last-child{margin-bottom:0}ul li>*:first-child,ol li>*:first-child{margin-top:0}#wrapper>ul,#wrapper>ol{margin-top:21px;margin-left:36px}dl{margin:0;padding:20px 0 0}dl dt{font-size:14px;font-weight:bold;line-height:normal;margin:0;padding:20px 0 0}dl dt:first-child{padding:0}dl dd{font-size:13px;margin:0;padding:3px 0 0}blockquote{margin:14px 0;border-left:4px solid #ddd;padding-left:11px;color:#555}table{border-collapse:collapse;margin:20px 0 0;padding:0}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th,table tr td{border:1px solid #ccc;text-align:left;margin:0;padding:6px 13px}img{max-width:100%;height:auto}code,tt{margin:0 2px;padding:2px 5px;white-space:nowrap;border:1px solid #ccc;background-color:#f8f8f8;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px;font-size:12px}pre>code{margin:0;padding:0;white-space:pre;border:0;background:transparent;font-size:13px}.highlight pre,pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px;-moz-border-radius:3px;-webkit-border-radius:3px}#wrapper>pre,#wrapper>div.highlight{margin:10px 0 0}pre code,pre tt{background-color:transparent;border:0}#wrapper{background-color:#fff;border:1px solid #cacaca;padding:30px}.poetry pre{font-family:Georgia,Garamond,serif!important;font-style:italic;font-size:110%!important;line-height:1.6em;display:block;margin-left:1em}.poetry pre code{font-family:Georgia,Garamond,serif!important}sup,sub,a.footnote{font-size:1.4ex;height:0;line-height:1;vertical-align:super;position:relative}sub{vertical-align:sub;top:-1px}@media print{body{background:#fff}img,pre,blockquote,table,figure{page-break-inside:avoid}#wrapper{background:#fff;border:0}code{background-color:#fff;color:#444!important;padding:0 .2em;border:1px solid #dedede}pre code{background-color:#fff!important;overflow:visible}pre{background:#fff}}@media screen{body.inverted,.inverted #wrapper,.inverted hr .inverted p,.inverted td,.inverted li,.inverted h1,.inverted h2,.inverted h3,.inverted h4,.inverted h5,.inverted h6,.inverted th,.inverted .math,.inverted caption,.inverted dd,.inverted dt,.inverted blockquote{color:#eee!important;border-color:#555}.inverted td,.inverted th{background:#333}.inverted pre,.inverted code,.inverted tt{background:#444!important}.inverted h2{border-color:#555}.inverted hr{border-color:#777;border-width:1px!important}::selection{background:rgba(157,193,200,.5)}h1::selection{background-color:rgba(45,156,208,.3)}h2::selection{background-color:rgba(90,182,224,.3)}h3::selection,h4::selection,h5::selection,h6::selection,li::selection,ol::selection{background-color:rgba(133,201,232,.3)}code::selection{background-color:rgba(0,0,0,.7);color:#eee}code span::selection{background-color:rgba(0,0,0,.7)!important;color:#eee!important}a::selection{background-color:rgba(255,230,102,.2)}.inverted a::selection{background-color:rgba(255,230,102,.6)}td::selection,th::selection,caption::selection{background-color:rgba(180,237,95,.5)}.inverted{background:#0b2531}.inverted #wrapper,.inverted{background:rgba(37,42,42,1)}.inverted a{color:rgba(172,209,213,1)}}.highlight .c{color:#998;font-style:italic}.highlight .err{color:#a61717;background-color:#e3d2d2}.highlight .k{font-weight:bold}.highlight .o{font-weight:bold}.highlight .cm{color:#998;font-style:italic}.highlight .cp{color:#999;font-weight:bold}.highlight .c1{color:#998;font-style:italic}.highlight .cs{color:#999;font-weight:bold;font-style:italic}.highlight .gd{color:#000;background-color:#fdd}.highlight .gd .x{color:#000;background-color:#faa}.highlight .ge{font-style:italic}.highlight .gr{color:#a00}.highlight .gh{color:#999}.highlight .gi{color:#000;background-color:#dfd}.highlight .gi .x{color:#000;background-color:#afa}.highlight .go{color:#888}.highlight .gp{color:#555}.highlight .gs{font-weight:bold}.highlight .gu{color:#800080;font-weight:bold}.highlight .gt{color:#a00}.highlight .kc{font-weight:bold}.highlight .kd{font-weight:bold}.highlight .kn{font-weight:bold}.highlight .kp{font-weight:bold}.highlight .kr{font-weight:bold}.highlight .kt{color:#458;font-weight:bold}.highlight .m{color:#099}.highlight .s{color:#d14}.highlight .na{color:#008080}.highlight .nb{color:#0086b3}.highlight .nc{color:#458;font-weight:bold}.highlight .no{color:#008080}.highlight .ni{color:#800080}.highlight .ne{color:#900;font-weight:bold}.highlight .nf{color:#900;font-weight:bold}.highlight .nn{color:#555}.highlight .nt{color:#000080}.highlight .nv{color:#008080}.highlight .ow{font-weight:bold}.highlight .w{color:#bbb}.highlight .mf{color:#099}.highlight .mh{color:#099}.highlight .mi{color:#099}.highlight .mo{color:#099}.highlight .sb{color:#d14}.highlight .sc{color:#d14}.highlight .sd{color:#d14}.highlight .s2{color:#d14}.highlight .se{color:#d14}.highlight .sh{color:#d14}.highlight .si{color:#d14}.highlight .sx{color:#d14}.highlight .sr{color:#009926}.highlight .s1{color:#d14}.highlight .ss{color:#990073}.highlight .bp{color:#999}.highlight .vc{color:#008080}.highlight .vg{color:#008080}.highlight .vi{color:#008080}.highlight .il{color:#099}.highlight .gc{color:#999;background-color:#eaf2f5}.type-csharp .highlight .k{color:#00F}.type-csharp .highlight .kt{color:#00F}.type-csharp .highlight .nf{color:#000;font-weight:normal}.type-csharp .highlight .nc{color:#2b91af}.type-csharp .highlight .nn{color:#000}.type-csharp .highlight .s{color:#a31515}.type-csharp .highlight .sc{color:#a31515}
</style>

</head>
<body class="normal">
  <div id="wrapper">
      <p><img src="images/SmallThinkBigIcon.png" alt="" /></p>

<h1 id="readmeforhadoopdatawarehousingwithhive">README for &#8220;Hadoop Data Warehousing with Hive&#8221;</h1>

<h2 id="stratahadoopworld2012tutorialexercises">Strata + Hadoop World 2012 Tutorial Exercises</h2>

<p>Dean Wampler</br>
<a href="&#109;&#x61;&#x69;&#x6c;&#x74;&#x6f;&#58;&#x61;&#x63;&#x61;&#100;&#101;&#x6d;&#x79;&#x40;&#x74;&#104;&#105;&#110;&#x6b;&#x62;&#x69;&#103;&#x61;&#110;&#97;&#x6c;&#121;&#116;&#105;&#x63;&#115;&#46;&#99;&#x6f;&#109;">&#97;&#x63;&#x61;&#x64;&#x65;&#x6d;&#121;&#x40;&#x74;&#x68;&#105;&#110;&#107;&#x62;&#x69;&#x67;&#x61;&#110;&#x61;&#108;&#x79;&#116;&#x69;&#x63;&#x73;&#x2e;&#99;&#111;&#109;</a><br/>
<a href="https://twitter.com/thinkBigA/">@thinkBigA</a></p>

<p><strong>Welcome!</strong> <em>Please follow these instructions to download the tutorial presentation and exercises.</em></p>

<h1 id="aboutthishivetutorial">About this Hive Tutorial</h1>

<p>This Hive Tutorial is adapted from a longer Think Big Academy course on Hive. (The Academy is the education arm of Think Big Analytics.) We offer various public and private courses on Hadoop programming, Hive, Pig, etc. We also provide consulting on Big Data problems and their solutions, especially using Hadoop. If you want to learn more, visit <a href="http://thinkbiganalytics.com">thinkbiganalytics.com</a> or <a href="&#x6d;&#x61;&#105;&#x6c;&#x74;&#111;&#58;&#x69;&#x6e;&#x66;&#111;&#x40;&#x74;&#x68;&#105;&#x6e;&#107;&#x62;&#105;&#103;&#97;&#x6e;&#97;&#108;&#x79;&#116;&#x69;&#x63;&#x73;&#x2e;&#99;&#x6f;&#x6d;">&#x73;&#101;&#110;&#x64; &#x75;&#115; &#x65;&#x6d;&#x61;&#105;&#108;</a>.</p>

<p>We&#8217;ll log into <a href="http://aws.amazon.com/elasticmapreduce/">Amazon Elastic MapReduce</a> (EMR) clusters<a href="#fn:1" id="fnref:1" title="see footnote" class="footnote">[1]</a> to do the exercises.
Feel free to <em>pair program</em> with a neighbor, if you want.</p>

<p><strong>NOTE:</strong> The exercises should work with any version of Hive, v0.7.1 or later.</p>

<h2 id="gettingstarted">Getting Started</h2>

<p>Download the following zip file that contains a PDF of the tutorial presentation, the exercises, the data used for the exercises, and a Hive <em>cheat sheet</em>:</p>

<ul>
<li><a href="https://s3.amazonaws.com/thinkbigacademy/StrataHW2012/Hive-Tutorial/tutorial.zip">Hive Tutorial, Exercises, Data, etc.</a>.</li>
</ul>

<p>Unzip the <code>tutorial.zip</code> in a convenient place on your laptop.</p>

<p>If you are on Windows, you&#8217;ll need the <code>ssh</code> client application <a href="http://the.earth.li/~sgtatham/putty/latest/x86/putty.zip">putty</a> to log into the EMR servers. You can download and install it from here:</p>

<ul>
<li><a href="http://the.earth.li/~sgtatham/putty/latest/x86/putty.zip">Putty Installer</a>.</li>
</ul>

<h4 id="manifestfortutorialzipfile">Manifest for Tutorial Zip File</h4>

<table>
<colgroup>
<col style="text-align:left;"/>
<col style="text-align:left;"/>
</colgroup>

<thead>
<tr>
	<th style="text-align:left;">Item</th>
	<th style="text-align:left;">Whazzat?</th>
</tr>
</thead>

<tbody>
<tr>
	<td style="text-align:left;"><code>README.html</code></td>
	<td style="text-align:left;">What you&#8217;re reading!</td>
</tr>
<tr>
	<td style="text-align:left;"><code>ThinkBigAcademy-Hive-Tutorial.pdf</code></td>
	<td style="text-align:left;">The tutorial presentation.</td>
</tr>
<tr>
	<td style="text-align:left;"><code>exercises</code></td>
	<td style="text-align:left;">The exercises we&#8217;ll use. They are also installed on the clusters, but you&#8217;ll open them &#8220;locally&#8221; in an editor, then use copy and paste.</td>
</tr>
<tr>
	<td style="text-align:left;"><code>data</code></td>
	<td style="text-align:left;">The data files we&#8217;ll use. They are here only for your reference later. We&#8217;ll use the copies already on the clusters.</td>
</tr>
<tr>
	<td style="text-align:left;"><code>HiveCheatSheat.html</code></td>
	<td style="text-align:left;">A Hive <em>cheat sheet</em>.</td>
</tr>
<tr>
	<td style="text-align:left;"><code>exercises/.hiverc</code></td>
	<td style="text-align:left;">Drop this file in the home directory on any machines where you will normally run the <code>hive</code> <em>command-line interface</em> (CLI). Hive will run the commands it contains when it starts. This file is a great place to put commands you always run on startup, such as property settings. Already on the cluster.</td>
</tr>
</tbody>
</table>
<h3 id="logintooneoftheamazonelasticmapreduceclusters">Log into one of the Amazon Elastic MapReduce Clusters</h3>

<p>We have several EMR clusters running and you&#8217;ll log into one of them according to the <em>first</em> one or two letters of your <em>last</em> name, using the following table<a href="#fn:2" id="fnref:2" title="see footnote" class="footnote">[2]</a>:</p>

<table>
<colgroup>
<col style="text-align:left;"/>
<col style="text-align:left;"/>
<col style="text-align:left;"/>
</colgroup>

<thead>
<tr>
	<th style="text-align:left;"><strong>Letters</strong></th>
	<th style="text-align:left;"><strong>Server Name</strong></th>
	<th style="text-align:left;"><strong>JobFlow ID</strong></th>
</tr>
</thead>

<tbody>
<tr>
	<td style="text-align:left;"><code>A</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Ba - Bh</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Bi - Bz</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Ca - Ch</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Ci - Cz</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>D</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>E  - F</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>G</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>H</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>I  - J</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>K  - L</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Ma - Mh</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Mi - Mz</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>N  - P</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Q  - R</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Sa - Sh</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Si - Sz</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>T  - V</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Wa - Wh</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
<tr>
	<td style="text-align:left;"><code>Wi - Z</code></td>
	<td style="text-align:left;"><code>ec2-50-19-185-170.compute-1.amazonaws.com</code></td>
	<td style="text-align:left;"><code>j-1R3E26P0T3IBK</code></td>
</tr>
</tbody>
</table>
<p>(We&#8217;ll explain the <strong>JobFlow ID</strong> later.)</p>

<p>Once you have picked the correct server, use the following <code>ssh</code> command, for Linux, Mac OSX, or use the equivalent <code>putty</code> command to log into your server. You&#8217;ll be user <code>hadoop</code>:</p>

<pre><code>ssh hadoop@ec2-NN-NN-NNN-NNN.compute-1.amazonaws.com
</code></pre>

<p>The password is:</p>

<pre><code>strata
</code></pre>

<p>Finally, since you are sharing the primary user account on the cluster, create a personal work directory using <code>mkdir</code> for any file editing that you&#8217;ll do today. Pick a name for the directory without spaces, i.e., like a typical user name. You will use that same name for another purpose shortly, as we&#8217;ll see. After creating it, change to that directory with the <code>cd</code> command:</p>

<pre><code>mkdir myusername
cd myusername
</code></pre>

<p><strong>Please don&#8217;t break anything!</strong> ;^) Remember, you&#8217;re sharing this cluster. </p>

<p>Feel free to snoop around if you&#8217;re waiting for others. Note that all the Hadoop software is installed in the <code>hadoop</code> user&#8217;s <code>$HOME</code> directory, <code>/home/hadoop</code>.</p>

<h3 id="quickcheatsheetonlinuxshellcommands">Quick Cheat Sheet on Linux Shell Commands</h3>

<p>If you&#8217;re not accustomed to the Linux or Mac OSX <code>bash</code> shell, here are a few hints<a href="#fn:3" id="fnref:3" title="see footnote" class="footnote">[3]</a>:</p>

<h4 id="printyourcurrentworkingdirectory">Print your current working directory</h4>

<pre><code>pwd
</code></pre>

<h4 id="listthecontentsofadirectory">List the contents of a directory</h4>

<p>Add the <code>-l</code> option to show a <em>longer</em> listing with more information. If you omit the directory, the current directory is used:</p>

<pre><code>ls some-directory
ls -l some-directory
</code></pre>

<h4 id="changetoadifferentdirectory">Change to a different directory</h4>

<p>Four variants; using i) an absolute path, ii) a subdirectory of the current directory, iii) the parent directory of the current directory, and iv) your home directory:</p>

<pre><code>cd /home/hadoop
cd exercises
cd ..
cd ~
</code></pre>

<h4 id="pagethroughthecontentsofafile.">Page through the contents of a file.</h4>

<p>Hit the space bar to page, <code>q</code> to quit:</p>

<pre><code>more some-file  
</code></pre>

<h4 id="dumpthecontentswithoutpaging">Dump the contents without paging</h4>

<p>I.e., &#8220;concatenate&#8221; or &#8220;cat&#8221; the file:</p>

<pre><code>cat some-file
</code></pre>

<h2 id="formoreinformation">For More Information</h2>

<p>For more information on Amazon Elastic MapReduce commands, see the
<a href="http://s3.amazonaws.com/awsdocs/ElasticMapReduce/latest/emr-qrc.pdf">Quick Reference Guide</a>
and the <a href="http://s3.amazonaws.com/awsdocs/ElasticMapReduce/latest/emr-dg.pdf">Developer Guide</a>. </p>

<p>For more details on Hive, see <a href="http://shop.oreilly.com/product/0636920023555.do">Programming Hive</a> or the <a href="https://cwiki.apache.org/confluence/display/Hive/Home">Hive Wiki</a>.</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>Visit The <a href="http://aws.amazon.com/elasticmapreduce/">AWS EMR Page</a> and the <a href="http://aws.amazon.com/documentation/elasticmapreduce/">EMR Documentation page</a> for more information about EMR. <a href="#fnref:1" title="return to article" class="reversefootnote">&#160;&#8617;</a></p>
</li>

<li id="fn:2">
<p>I used the <a href="http://wiki.answers.com/Q/What_is_the_percent_distribution_of_first_letters_in_last_names_in_the_US">following information</a> to determine a good distribution of users across these clusters. Note that these EMR clusters will only be available during the time of the tutorial. <a href="#fnref:2" title="return to article" class="reversefootnote">&#160;&#8617;</a></p>
</li>

<li id="fn:3">
<p>You should learn how to use <code>bash</code> if you want to use Hadoop. <a href="#fnref:3" title="return to article" class="reversefootnote">&#160;&#8617;</a></p>
</li>

</ol>
</div>

    </div>
</body>
</html>

About

Following along with the Hive tutorial at StrataConf / HadoopWorld


Languages

Language:Java 75.3%Language:Python 15.2%Language:Shell 9.5%