maheshsv / hoidla

Set of real time algorithms used by big data streaming platform

Home Page:http://pkghosh.wordpress.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Set of reusable big data real time streaming algorithms. Can be used by Spark Streaming, Storm or any other stream computation framework

Philosophy

  • Plain java API that can be used from any stream computation framework

Blogs

The following blogs of mine are good source of details. These are the only source of detail documentation

Solution

  • Probabilstic frequent count with sketches and count based algorithms
  • Probabilstic cardinality or unique item count
  • Probabilstic set inclusion
  • Different sampling methods
  • Windowing including simple stats
  • Pattern detection
  • Event cluster detection

Getting started

Project's resource directory has various tutorial documents for the use cases described in the blogs.

Help

Please feel free to email me at pkghosh99@gmail.com

About

Set of real time algorithms used by big data streaming platform

http://pkghosh.wordpress.com/


Languages

Language:Java 99.9%Language:Scala 0.1%