hiway / python-bloom-filter

Bloom filter for Python

Home Page:https://pypi.org/project/bloom-filter/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Low Hamming weight hash function leads to many false positives

cxsmith opened this issue · comments

Example:

import bloom_filter
f = bloom_filter.BloomFilter(max_elements=10,error_rate=1E-6)
for x in range(10):
    f.add(x)

false_positives = 0
for y in range(10, 1000000):
    if y in f:
            false_positives += 1

print("Got %d false positives (expected 1)" % false_positives)

yields 213316 false positives when just one is expected.

I did some research into the problem and have attached a deep dive into the cause as well as a fix with updated unit tests as pull request #5 .