scalaz / scalaz-analytics

A high-performance, purely-functional library for doing computational analysis and statistics over data in a type-safe way

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scalaz-analytics

Gitter

Goal

Scalaz Analytics provides a high-performance, purely-functional library for doing computational analysis and statistics over data in a type-safe way.

Introduction & Highlights

Scalaz Analytics is a principled functional programming library for data processing and analytics.

  • Simple and principled
  • First class support for analytics and data science
  • Pure type-safe, functional interface that integrates with other Scalaz projects
  • Supports batch and streaming
  • Efficient on both small and large data sets, single machine and distributed
  • Can be used from a REPL for interactive analysis or as a library for applications

Other libraries

Below is a selection of Analytics/Data processing Libraries that we are being used as inspiration. Some of these metrics are somewhat subjective but they give an idea for what we are looking at from each library. Note that these metrics assume native support, so libraries that achieve these things via another library are not considered.

Library Scales to Big Data Supports Batch Supports Streaming FP Easy to Debug Out of the box analytics
Spark ✔ (mini batch)
Flink
Pandas
R ?
Dask ?
Apex ?
Beam ?

Background

About

A high-performance, purely-functional library for doing computational analysis and statistics over data in a type-safe way

License:Apache License 2.0


Languages

Language:Scala 100.0%