pramodbhn/estado

Introduction
============
Estado is a small java application that collects Hadoop job status including counters. Ultimate
goal is to collect all job statistics from all Hadoop clusters and collect them in a central 
repository for further analysis, dash board and alerting. Example of an alert might be certain 
counter exceeding some threshold.

Architecture
============
Estado should be setup as a cron job one for each cluster. it could run in any Hadoop node in
a given cluster. 

There is a pluggable status consumer, that consumes status data and does whatever it wants to do.
Typically, the consumer will write the status data in one central database, so that all Hadoop job
status data is consolidated in one place.

There is a default JobStatusConsoleConsumer class already included, that simply writes the data to 
console. In future, I will provide pluggable consumers for file and RDBMS.

Configuration
=============
The configuration is very simple. Here is an example.

cluster.name=default
status.filter=srfpk
consumer.console.class=org.estado.spi.JobStatusConsoleConsumer

Each hadoop cluster should be given an unique name. The filter parameter provides a list of job statuses
to be used as filter, where s=succedded r=running f=failed p=preparing and k=killed

The last line provides the consumer class name. The second component of the property
name is a name (e.g., console) for the consumer. Some consumers may need additional parameters. Here is 
an example for a file consumer.

consumer.file.class=org.estado.spi.JobStatusFileConsumer
consumer.file.name=/var/log/estado.log

Here is the configuration for an RDBMS cosumer using JDBC

consumer.rdbms.class=org.estado.spi.JobStatusRdbmsConsumer
consumer.rdbms.url=jdbc:mysql://localhost/starea?user=admin&password=welcome

Execution
=========
Here is a sample script to run estado

JAR=/home/pranab/.m2/repository/mawazo/estado/1.0/estado-1.0.jar
CL=org.estado.core.JobStatusChecker
CONFIG=/home/pranab/Projects/run/estado/estado.properties
hadoop jar $JAR $CL $CONFIG


Sample Output
=============
Here is some sample output for console consumer.

Cluster:default
Job Id:job_201107171959_0001
User:pranab
Start time:2011-07-17 08:04:03
End time:2011-07-17 08:06:20
Duration:02min:17sec:230ms
Map progress:100
Reduce progress:100
Status:completed

Counter group:Job Counters
	Launched reduce tasks=1
	SLOTS_MILLIS_MAPS=81621
	Total time spent by all reduces waiting after reserving slots (ms)=0
	Total time spent by all maps waiting after reserving slots (ms)=0
	Launched map tasks=1
	Data-local map tasks=1
	SLOTS_MILLIS_REDUCES=52691
Counter group:FileSystemCounters
	FILE_BYTES_READ=349
	HDFS_BYTES_READ=25157
	FILE_BYTES_WRITTEN=115394
	HDFS_BYTES_WRITTEN=331
Counter group:Map-Reduce Framework
	Reduce input groups=2
	Combine output records=2
	Map input records=1000
	Reduce shuffle bytes=349
	Reduce output records=2
	Spilled Records=4
	Map output bytes=19776
	Combine input records=2000
	Map output records=2000
	SPLIT_RAW_BYTES=111
	Reduce input records=2
pramodbhn / estado

About

Languages