Roger M's repositories
wikipedia-extractor
Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory. This is a mirror of the script by Giuseppe Attardi.
Avro-Schema-Generator
Tool which generates Avro schemas and Java bindings from XML schemas.
elasticsearch-hadoop
Elasticsearch real-time search and analytics natively integrated with Hadoop
machine-learning
Content for Udacity's Machine Learning curriculum
pipeline
End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark ML, GraphX, Spark Streaming, Kafka, Cassandra, ElasticSearch, Redis, Tachyon, HDFS, Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau. See https://github.com/fluxcapacitor/pipeline/wiki for Setup Instructions.
toy-robot-simulator
A Scala + Akka implementation of the Toy Robot application
vowpal_wabbit
John Langford's original release of Vowpal Wabbit -- a fast online learning algorithm
word2vec-wikipedia-spark
Utility to perform feature extraction via spark-word2vec on the wikipedia (en) dataset