bloomberg

Work projects for Hadoop and HBase monitoring / testing.

projects

jmx-client: client for accessing HBase / HDFS and exporting RESTful JSON jmx metrics.
jmx-metrics: library for background jmx Hadoop metrics2 logging.
mapreduce: a standard mapreduce job. first one I have done properly...

very important things to read.

The Google File System (2003):
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
[https://research.google.com/archive/gfs.html]
MapReduce: Simplified Data Processing on Large Clusters (2004):
Jeffrey Dean and Sanjay Ghemawat
[https://research.google.com/archive/mapreduce.html]
Web Search for a Planet: The Google Cluster Architecture (2003):
Luiz Andre Barroso, Jeffrey Dean, and Urs Holzle
[https://static.googleusercontent.com/media/research.google.com/en//archive/googlecluster-ieee.pdf]
Experiences with MapReduce, an Abstraction for Large-Scale Computation (2006):
Jeffery Dean
[https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/32721.pdf]
The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998):
Sergey Brin and Lawrence Page
[https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/334.pdf]
Impossibility of Distributeed Consensus with One Faulty Process (1985):
Michael Fischer, Nancy Lynch, and Machael Paterson
[https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf]
Paxos Made Simple (2001):
Leslie Lamport
[http://lamport.azurewebsites.net/pubs/paxos-simple.pdf]

distrubted computing scratchpad for my job.

Language:Java 95.4%Language:Ruby 1.6%Language:Shell 1.5%Language:HTML 1.0%Language:Python 0.6%