boundary / folsom

Expose Erlang Events and Metrics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

folsom_sample_uniform sampling error

nygge opened this issue · comments

folsom_sample_uniform does not generate a true random sampling of the values.
Once the reservoir is full each new value is replacing an existing value in the reservoir.
This causes a bias for later values.
e.g.

lists:foldl(fun(V,A) -> folsom_sample_uniform:update(A,V) end, folsom_sample_uniform:new(5000),lists:seq(1,50000)).
lists:sum(folsom_sample_uniform:get_values(S))/5000.

gives an arithmetic mean of ~45000 instead of the expected 25000

Exactly which algorithm in the Vitter paper is supposed to be implemented here?