shanemhansen / countish-rs

Approximate frequency counting in rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Approximate frequency counts over data streams for Rust

Countish implements two approximate counting algorithms outlined in “Approximate Frequency Counts over Data Streams”.

http://www.vldb.org/conf/2002/S10P03.pdf

Currently it only works with strings. The public API isn’t fixed, I expect to support other hashable types. Once I learn a little more rust ; )

See this blog post for some more background on counting and the go implementation: http://whitane.com/post/how-to-countish/

The rust impl is actually much faster than the go version. Which given that I know lots of go and almost no rust is pretty exciting.

Use cases

Have you ever needed to do something like calculate the top URLs or top ips from an infinite stream? This package provides probabalistic frequency counters, with accuracy guarantees and low memory usage.

countish provides an extremely simple interface consisting of an “observe” method and a “items_above_threshold” method.

Example:

cargo build --release
cat urls.txt |  ./target/release/rcountish --threshold 0.3
/ 0.42867142857142854

3 counting implementations are provided.

  1. Naive: exact counts are held in a map
  2. Lossy: corresponding to “lossy counting”
  3. Sticky: corresponding to “sticky sampling”

Example:

examples showing memory usage comparisons

About

Approximate frequency counting in rust


Languages

Language:Rust 100.0%