tris790 / bloom-filter

A python implementation of the Data Structure Bloom-Filter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bloom-filter

Bloom filter implementation for BIN-702

Getting Started

> clone the repository  
> run the main.py to start the benchmarks

Prerequisites

python 3.X
mmh3 lib -> pip install mmh3

Running the benchmark

> cd src/  
> python main.py

Benchmark results

Set mb peak mb insert ms clear ms find ms
1000 0.000296 0.041276 0.9889 0 0
10000 0.000296 0.655676 1.9998 0 0
100000 0.000324 6.291772 19.9799 1.9973 8.9914
2000000 0.000324 100.663612 457.532 50.9484 203.3379
Bloom mb peak mb insert ms clear ms find ms error prob hash count bit count error count
1000 0.010876 0.011449 15.9934 0 15.9833 0.001 7 9586 1
10000 0.0993 0.099879 179.8307 1.9978 155.8389 0.0021 7 95851 21
100000 1.04424 1.044825 1634.4944 44.9578 1571.1436 0.00154 7 958506 154
2000000 19.83744 19.838031 35077.5659 1017.1684 35077.5659 0.0016655 7 19170117 3331
Bloom mb peak mb insert ms clear ms find ms error prob hash count bit count error count
1000 0.000296 0.041276 11.9879 0 12.9987 0.003 5 9586 3
10000 0.0993 0.099882 170.8237 1.9972 126.8831 0.0037 5 95851 37
100000 1.04424 1.044825 1246.7373 48.9547 1311.182 0.00221 5 958506 221
2000000 19.83744 19.838034 25255.1331 1037.1826 23715.8606 0.002236 5 19170117 4472
Bloom mb peak mb insert ms clear ms find ms error prob hash count bit count error count
1000 0.00398 0.004556 12.9867 0.9983 15.9839 0.108 5 2949 108
10000 0.030796 0.031378 122.8838 0.9991 115.8721 0.1099 5 29492 1099
100000 0.321784 0.322372 1205.3608 11.9547 1442.8068 0.10759 5 294924 10759
2000000 6.10908 6.109674 24957.1912 330.6803 24417.6577 0.1064285 5 5898497 212857

Possible Applications in Bioinformatics

  • Sequence characterization
  • Genome assembly
  • Sequencing error correction
  • RNA-Seq

License

MIT

Authors

Tristan Deschamps

About

A python implementation of the Data Structure Bloom-Filter

License:MIT License


Languages

Language:Python 100.0%