rurban / adaptive-table

Adaptive data structure for stream processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adaptive Table

adaptive-table implements the underlying data structure described in the paper Data Streams as Random Permutations: the Distinct Element Problem.

In the paper, A. Helmi, J. Lumbroso, C. MartĂ­nez and A. Viola describe an algorithm for estimating the cardinality of a data stream. The algorithm counts the number of records in the underlying permutation of elements without taking into account repetitions. The authors explain in section 4.1 how the data structure, which is used for coutning the records (same structure used in MinHash), can "grow" following different strategies. The expected memory usage of the data structure is equation or equation where equation and n is the number of distinct elements in the data stream.

This plot shows the final size of the table starting with sizes 2, 4, 8 , 16, 32 and 64 after inserting up to 5 billion elements

mean_sizes

This implementation can be used in other algorithms, which can take advantage of the adaptative size of the table.

This implementation is used in some other go packages like:

About

Adaptive data structure for stream processing

License:MIT License


Languages

Language:Go 100.0%