Clayton Kim's starred repositories

luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language:PythonLicense:Apache-2.0Stargazers:17460Issues:473Issues:983

deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

Language:JavaLicense:Apache-2.0Stargazers:13486Issues:767Issues:5751

smile

Statistical Machine Intelligence & Learning Engine

Language:JavaLicense:NOASSERTIONStargazers:5960Issues:268Issues:598

chronos

Fault tolerant job scheduler for Mesos which handles dependencies and ISO8601 based schedules

Language:ScalaLicense:Apache-2.0Stargazers:4371Issues:288Issues:452

shapeless

Generic programming for Scala

Language:ScalaLicense:Apache-2.0Stargazers:3370Issues:103Issues:410

guesstimate-app

Create Fermi Estimates and Perform Monte Carlo Estimates

Language:TypeScriptLicense:MITStargazers:2327Issues:56Issues:395

sixpack

Sixpack is a language-agnostic a/b-testing framework

Language:PythonLicense:BSD-2-ClauseStargazers:1757Issues:71Issues:195

aas

Code to accompany Advanced Analytics with Spark from O'Reilly Media

Language:ScalaLicense:NOASSERTIONStargazers:1515Issues:148Issues:106

egads

A Java package to automatically detect anomalies in large scale time-series data

Language:JavaLicense:NOASSERTIONStargazers:1158Issues:114Issues:36

datumbox-framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Language:JavaLicense:Apache-2.0Stargazers:1086Issues:137Issues:29

bamboo

HAProxy auto configuration and auto service discovery for Mesos Marathon

Language:GoLicense:Apache-2.0Stargazers:793Issues:72Issues:125
Language:ShellLicense:Apache-2.0Stargazers:764Issues:65Issues:47

kafka-storm-starter

[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Language:ScalaLicense:NOASSERTIONStargazers:727Issues:91Issues:12

kafka-spark-consumer

High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.

Language:JavaLicense:Apache-2.0Stargazers:631Issues:70Issues:61

emr-bootstrap-actions

This repository hold the Amazon Elastic MapReduce sample bootstrap actions

Language:ShellLicense:NOASSERTIONStargazers:614Issues:78Issues:91

vagrant-mesos

Spin up your Mesos Cluster with Vagrant! (VirtualBox and AWS)

Language:RubyLicense:MITStargazers:432Issues:32Issues:57

hadoop-ansible

Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.

Language:ShellLicense:Apache-2.0Stargazers:418Issues:47Issues:13

wirbelsturm

[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Language:ShellLicense:NOASSERTIONStargazers:330Issues:36Issues:37

keras-wtte-rnn

Demo Weibull Time-to-event Recurrent Neural Network in Keras

Language:PythonLicense:MITStargazers:217Issues:13Issues:7

mario

Functional, Typesafe, Declarative Data Pipelines

Language:ScalaLicense:MITStargazers:139Issues:90Issues:2

kafka-deploy

Automated deploy for Kafka on AWS

Language:ClojureLicense:NOASSERTIONStargazers:123Issues:9Issues:0

kcbo

A Bayesian testing framework written in Python.

Language:PythonLicense:MITStargazers:95Issues:9Issues:1

lift

...Do you even? Exercise in exercise analysis

Language:C++License:NOASSERTIONStargazers:91Issues:27Issues:25

ReactiveLDA

ReactiveLDA is a fast, lightweight implementation of the Latent Dirichlet Allocation (LDA) algorithm, using a parallel vanilla Gibbs sampling algorithm.

Language:ScalaLicense:MITStargazers:62Issues:22Issues:1

spark-ec2

[NOTE: Repository has moved to github.com/amplab/spark-ec2]

Language:ShellLicense:Apache-2.0Stargazers:57Issues:23Issues:0

Foundry-vagrant-mesos-kafka-cluster

A Vagrant/Ansible => Kafka, Mesos (w/ Marathon/Docker), ZK, Hadoop, and Spark. Service discovery via HAProxy and Bamboo.

Language:ShellLicense:MITStargazers:50Issues:9Issues:3

docker-basenode

Docker service discovery where applications in each container route traffic through localhost haproxy to connect to other services in the cluster. Don't hardcode IP addresses.

Language:ScalaLicense:Apache-2.0Stargazers:21Issues:17Issues:3

Dockerfiles

Dockerfiles

Language:PythonLicense:MITStargazers:9Issues:2Issues:8

nerve-etcd

Nerve registration container (etcd backend)

Language:ShellLicense:NOASSERTIONStargazers:6Issues:3Issues:1