junkumar / hll-hive-udf

Approximate cardinality estimation with HyperLogLog, as a Hive function

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

An implementation of the HyperLogLog approximate cardinality estimation algorithm (as well as Linear Counting), as a Hive User-defined Aggregation Function (UDAF).

Relies on Clearspring's stream-lib for implementation of the relevant algorithms.

See the original project's Wiki for usage instructions.

About

Approximate cardinality estimation with HyperLogLog, as a Hive function

License:Apache License 2.0


Languages

Language:Java 100.0%