Justin Miller's repositories
clean-hadoop-tmp
Cleans up data older than N seconds in /tmp on HDFS.
toolbox
Hadoop, NoSQL, Web, Unix tools - HDFS performance debugger and other tools, watch_url.pl for load balanced web farms, XML / running Hadoop cluster config diff, Ambari FreeIPA Kerberos deployment, Pig => Solr, Pig & Hive => Elasticsearch, Linux cli tools eg. scrub.pl config/log anonymizer for posting to online forums etc.
cloudera-scm
Cloudera SCM Operations
CloudForms_Essentials
Red Hat CloudForms Essentials Project
DataSciencePython
common data analysis and machine learning tasks using python
datasharing
The Leek group guide to data sharing
hadoop-book
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
hadoop-pcap
Hadoop library to read packet capture (PCAP) files
HadoopDNSVerifier
This tool is expected to be run locally on the client in question and can not be used to verify a remote clients DNS configuration settings
json-data-generator
A robust, generic, streaming random json data generator for your data
makemeasandwich.js
A Node.js + Phantom.js command line application that will automatically order you a sandwich from Jimmy John's. ( http://xkcd.com/149 )
MyNotes
Self-written notes that may be useful
nodejs-ex
node.js example
opsweekly
On call alert classification and reporting
papers-we-love
Papers from the computer science community to read and discuss.
perf-tools
Performance analysis tools based on Linux perf_events (aka perf) and ftrace
pgosquery
Like Facebook's OSQuery, but for Postgres
relevant-search-book
Code and Examples for Relevant Search
seesaw
Seesaw v2 is a Linux Virtual Server (LVS) based load balancing platform.
signal-collect
A framework for scalable graph computing.
Wonderland-Scala-Katas
Scala port of gigasquid/wonderland-clojure-katas