everpeace / bloom-filter

This is an implementation of Bloom Filter, especially CountingBloomFilter.

Home Page:https://github.com/everpeace/bloom-filter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This ia an implementation of Bloom Filter, especially Counting Bloom Filter.

What is Bloom Filter?:
see http://en.wikipedia.org/wiki/Bloom_filter.


How to build and run the sample:
1. set up Simple Build Tool(http://code.google.com/p/simple-build-tool/)
2. run sbt
3. update
4. test 
5. doc (optional)
6. run <false positive prob. to set to the filter> <number of items>


Sample Scenario:
0.initialize a CountingBloomFilter with input max false positive probability and input expected number of items. 
1.add random distinct 2n(twice of your input) strings to the filter
2.discard n strings in added strings from the filter 
3.check that the filter contains each discarded string(i.e. measure false positive prob.). 
4.outputs false positive results and average execution times. 


Sample Scenario Outputs:
[FilterInfo] filter-bit-size:2019773, number of hash:14, max false positive prob.: 0.0001000
[Adding 200000 distinct random strings].......... done.
[Discarding 100000 strings].......... done.
[Detected false positive results]
   hfG   Thq4   zdfR   zJcz   6Gww   iHoc   ovam   jumy   wvzw 
[mesured false positive prob.] 0.0000900 (9 false positives found in 100000 trials)
[average time for     add] 0.007[ms] (average for 200000 times trials)
[average time for discard] 0.006[ms] (average for 100000 times trials)
[average time for   check] 0.006[ms] (average for 100000 times trials)


Sample SBT Outputs:
$sbt
[info] Building project bloom-filter 1.0 against Scala 2.8.1
[info]    using BloomFilterProject with sbt 0.7.5 and Scala 2.7.7
> test
[info] 
[info] == compile ==
[info]   Source analysis: 2 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Compilation successful.
[info]   Post-analysis: 37 classes.
[info] == compile ==
[info] 
[info] == copy-resources ==
[info] == copy-resources ==
[info] 
[info] == copy-test-resources ==
[info] == copy-test-resources ==
[info] 
[info] == test-compile ==
[info]   Source analysis: 1 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling test sources...
[info] Compilation successful.
[info]   Post-analysis: 22 classes.
[info] == test-compile ==
[info] 
[info] == test-start ==
[info] == test-start ==
[info] 
[info] == org.everpeace.util.CountingBloomFilterTest ==
[info]   + Counts does not change after add and discard.
[info]   + No false negative results.
[FilterInfo] filter-bit-size:2019773, number of hash:14, max false positive prob.: 0.0001000
[mesured false positive prob.] 0.0000900 (9 false positives found in 100000 trials)
[info]   + False positive prob. is less than max false positive prob. set.
[info] == org.everpeace.util.CountingBloomFilterTest ==
[info] 
[info] == test-complete ==
[info] == test-complete ==
[info] 
[info] == test-finish ==
[info] Passed: : Total 3, Failed 0, Errors 0, Passed 3, Skipped 0
[info]  
[info] All tests PASSED.
[info] == test-finish ==
[info] 
[info] == Test cleanup 1 ==
[info] Deleting directory /var/folders/LA/LAiTnWCkECC26KLkxKXrIE+++TI/-Tmp-/sbt_39625890
[info] == Test cleanup 1 ==
[info] 
[info] == test-cleanup ==
[info] == test-cleanup ==
[info] 
[info] == test ==
[info] == test ==
[success] Successful.
[info] 
[info] Total time: 19 s, completed 2011/05/04 22:46:21
> run 0.0001 100000
[info] 
[info] == copy-resources ==
[info] == copy-resources ==
[info] 
[info] == compile ==
[info]   Source analysis: 0 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Nothing to compile.
[info]   Post-analysis: 37 classes.
[info] == compile ==
[info] 
[info] == run ==
[info] Running org.everpeace.util.CountingBloomFilterSample 0.0001 100000
=========================================================================================================================================
Sample Senario:
0.initialize a CountingBloomFilter with max false positive probability is 0.0001000 and expected number of items is 100000
1.add distinct random 200000 strings the filter (twice of your input)
2.discard 100000 strings of added (your input)
3.check that the filter contains each filter. (outputs only false positive results)
-----------------------------------------------------------------------------------------------------------------------------------------
[FilterInfo] filter-bit-size:2019773, number of hash:14, max false positive prob.: 0.0001000
[Adding 200000 distinct random strings].......... done.
[Discarding 100000 strings].......... done.
[Detected false positive results]
   hfG   Thq4   zdfR   zJcz   6Gww   iHoc   ovam   jumy   wvzw 
[mesured false positive prob.] 0.0000900 (9 false positives found in 100000 trials)
[average time for     add] 0.007[ms] (average for 200000 times trials)
[average time for discard] 0.006[ms] (average for 100000 times trials)
[average time for   check] 0.006[ms] (average for 100000 times trials)
=========================================================================================================================================
[info] == run ==
[success] Successful.
[info] 
[info] Total time: 5 s, completed 2011/05/04 22:46:36

About

This is an implementation of Bloom Filter, especially CountingBloomFilter.

https://github.com/everpeace/bloom-filter


Languages

Language:Scala 100.0%