mpatricio / avenir

Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Set of predictive and exploratory machine learning tools. Runs on Hadoop, Spark and Storm

Philosophy

  • Simple to use
  • Input output in CSV format
  • Metadata defined in simple JSON file
  • Extremely configurable with tons of configuration knobs

Solution

  • Exploratry analytic including correlation, feature subset selection
  • Naive Bayes
  • Discrimininant analysis
  • Nearest neighbor
  • Decision tree and Random Forest
  • Association Mining
  • Reinforcement learning
  • Stochastic Optimization

Blogs

The following blogs of mine are good source of details of avenir. These are the only source of detail documentation

Getting started

Project's resource directory has various tutorial documents for the use cases described in the blogs.

Configuration

All configuration parameters are described in the wiki page https://github.com/pranab/avenir/wiki/Configuration

Build

Please refer to resource/dependency.txt for build time and run time dependencies

For Hadoop 1

  • mvn clean install

For Hadoop 2 (non yarn)

  • git checkout nuovo
  • mvn clean install

For Hadoop 2 (yarn)

  • git checkout nuovo
  • mvn clean install -P yarn

Help

Please feel free to email me at pkghosh99@gmail.com

Contribution

Contributors are welcome. Please email me at pkghosh99@gmail.com

About

Set of Machine Learning and Stochastic Optimazion tools based on Hadoop, Spark and Storm https://pkghosh.wordpress.com/


Languages

Language:Java 55.6%Language:Python 30.1%Language:Scala 11.9%Language:Shell 1.7%Language:Ruby 0.7%