goldsborough / bloom-cpp

:mag: A header-only bloom filter implementation in C++14.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🔍 bloom-cpp

A header-only bloom filter implementation in C++14

license build status

API

A Bloom::Filter is parameterized by a size n and the number of hash functions k to use for each insertion or query operation. Use put(key) to add a value to the filter and query(key) to test for membership.

#include <bloom/filter.hpp>

// 32 bits, 3 hash functions.
Bloom::Filter filter(32, 3);

filter.put(42);
filter.put("string");

assert(filter.query(42)); // guaranteed
assert(filter.query("string")); // guaranteed
assert(filter.query(123.456)); // maybe!

filter.clear(); // unset all entries
assert(!filter.query(42));

Notice that the filter is not parameterized by any (template) type, since a bloom filter generally does not store any values. Instead, put() and query() both take a Bloom::Slice -- a simple abstraction over a blob of bytes, i.e. a pointer and a size. Specialize the Bloom::Sliceable trait for your own type to make it usable throughout the library:

namespace Bloom {
template<> struct Sliceable<MyType> {
  Slice to_slice(const MyType& value) const {
    return {value.data(), value.size_in_bytes()};
  }
};
} // namespace Bloom

Bloom::Filter filter(32, 3);
filter.put(MyType(...));
filter.query(MyType(...));

Instead of passing the number of hash functions explicitly, you can also use one of Bloom::Options factory methods to compute the optimal number given either an expected false positive rate, or expected number of inserted values:

Bloom::Filter filter1(Bloom::Options::ForFalsePositiveRate(/*size=*/100, /*fp=*/0.2)); // k = 3
Bloom::Filter filter2(Bloom::Options::ForExpectedCount(/*size=*/100, /*expected_count=*/20)); // k = 4

Finally, the library also provides Bloom::StaticFilter which takes the size and hash count as (non-type) template parameters. Bloom::StaticFilter does not incur any heap allocations for its internal storage. The API is the same as Bloom::Filter.

#include <bloom/static-filter.hpp>

Bloom::StaticFilter<32, 3> filter;
filter.put(123);
filter.query("string");

Documentation

The documentation for this project can be built by running doxygen from within the docs/ folder. This will generate a build/html folder that contains the doxygen HTML output.

Building

The library is header only, so you can simply point your include path at the include/ folder.

If you are using CMake, you can also link against the bloom target, which has the correct include paths configured in its interface properties. For example:

add_executable(my-target)
add_subdirectory(third-party/bloom-cpp)
target_link_libraries(my-target bloom)

Developing

If you would like to contribute to this library, you will likely want to build the tests. For this, run the following commands starting at the project root:

mkdir build && cd build
cmake ..
make -j

This will generate a bloom-test binary that contains the tests. The cmake command will also download googletest on which the tests depend.

There is also a clang-tidy target you can use to run [clang-tidy] over the codebase. For this you should pass -DCMAKE_EXPORT_COMPILE_COMMANDS=ON to the cmake command above. For example:

mkdir build && cd build
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..
make -j
make clang-tidy

License

This project is released under the MIT License. For more information, see the LICENSE file.

About

:mag: A header-only bloom filter implementation in C++14.

License:MIT License


Languages

Language:C++ 88.9%Language:CMake 11.1%