Tom Mulder's repositories
blocks
A Theano framework for building and training neural networks
cloudera-csd-dev
CSD + parcel to expose environment variables and files in a browser to help developpers make CSD
DASMaps
DAS maps
divolte-collector
Divolte data collector
dl-machine
Scripts to setup a GPU / CUDA-enabled compute server with libraries for deep learning
elastic4s
Scala client and DSL for elasticsearch
elasticsearch-prediction
ElasticSearch Prediction Generator and Plugin
elasticsearch-prediction-spark
Generates Elasticsearch plugin to score/evaluate Spark Trained Models
hbase-rdd
Spark RDD to read and write from HBase
hue
Let’s big data. Hue is a Web interface for analyzing data with Apache Hadoop. It supports a file and job browser, Hive, Pig, Impala, Spark, Oozie editors, Solr Search dashboards, HBase, Sqoop2, and more.
impyla
Python client and Numba-based UDFs for Impala
IPython-notebooks
Some IPython notebooks I've created...
jade
A computer vision project to infer people's gender, age, and personality traits from Facebook profile photos.
kafka-rest
REST Proxy for Kafka
KafkaOffsetMonitor
A little app to monitor the progress of kafka consumers and their lag wrt the queue.
LearningSpark
Scala examples for learning to use Spark
medicare-demo
A demo of how to use PageRank with Hadoop and SociaLite to identify anomalies in Healthcare Data
overview-server
Open source large document set visualization platform
PredictionIO
PredictionIO, a machine learning server for developers and data scientists. Built on Apache Spark, HBase and Spray.
presto-csd
presto for cloudera manager
presto-parcel
presto for Cloudera Manager parcel
seldon-spark
Seldon Spark Jobs
spark-playground
Playground for experimenting with Apache Spark
storm-csd
Storm custom service descriptor (CSD) for Cloudera Manager
storm-parcel
Storm parcel for Cloudera Manager
TesseractOCR
Full text extraction using the Open Source Tesseract OCR software https://code.google.com/p/tesseract-ocr/ and imagemagick