pranab / beymani

Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.

Home Page:http://pkghosh.wordpress.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Beymani consists of set of Hadoop, Spark and Storm based tools for outlier and anamoly detection, which can be used for fraud detection, intrusion detection etc.

Philosophy

  • Simple to use
  • Input output in CSV format
  • Metadata defined in simple JSON file
  • Extremely configurable with tons of configuration knobs

Blogs

The following blogs of mine are good source of details of beymani

Algorithms

  • Univarite distribution model
  • Multi variate sequence or multi gram distribution model
  • Average instance Distance
  • Relative instance Density
  • Markov chain with sequence data
  • Spectral residue for sequence data
  • Quantized symbol mapping for sequence data
  • Local outlier factor for multivariate data
  • Instance clustering
  • Sequence clustering
  • Change point detection
  • Isolation Forest for multivariate data
  • Auto Encoder for multivariate data

Getting started

Project's resource directory has various tutorial documents for the use cases described in the blogs.

Build

For Hadoop 1

  • mvn clean install

For Hadoop 2 (non yarn)

  • git checkout nuovo
  • mvn clean install

For Hadoop 2 (yarn)

  • git checkout nuovo
  • mvn clean install -P yarn

For Spark

  • mvn clean install
  • sbt publishLocal
  • in ./spark sbt clean package

Help

Please feel free to email me at pkghosh99@gmail.com

Contribution

Contributors are welcome. Please email me at pkghosh99@gmail.com

About

Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.

http://pkghosh.wordpress.com/


Languages

Language:Jupyter Notebook 70.0%Language:Scala 11.9%Language:Java 9.3%Language:Python 6.6%Language:Shell 2.0%Language:Ruby 0.2%