bloom filter , what wrong ? why false ?
paulanalyst opened this issue · comments
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | || | | | (| | | Version 0.3.0-prerelease+2599 (2014-04-11 23:52 UTC)
/ |_'|||__'| | Commit bf7096c (51 days old master)
|__/ | x86_64-w64-mingw32
julia> using BloomFilters
julia> n, k = 100, 50
(100,50)
julia> ety=readcsv("etykiety_kli.txt")
56522x1 Array{Any,2}:
"EC00113876"
"EC00085985"
"EC00037297"
"EC00005413"
"EC00126328"
"EC00021867"
"EC00114062"
"EC00007751"
"EC00206892"
"EC00115609"
?
"EC00159409"
"EC00172340"
"EC00062096"
"EC00134183"
"EC00108009"
"EC00050665"
"EC00081817"
"EC00155357"
"EC00031904"
"EC00060934"
julia> filter = BloomFilter(n, k)
A Bloom filter
- Mask Size: 100
- Number of Hashes: 50
julia> add!(filter, ety)
julia> x=ety[2]
"EC00085985"
julia> contains(filter,x)
false
Why false ?
julia> findfirst(ety,x)
2
Paul
What's the SHA1 of the version of this package you're using?
BloomFilters,0.0.0
Ok. I think we may need to change the released version.
I downloaded today.
Yes, the released version is not up-to-date.
Try updating now. You need Julia 0.3-
.
Ok, updated, BloomFilters,0.0.1
but :
julia> using BloomFilters
julia> n, k = 100, 50
(100,50)
julia> filter = BloomFilter(n, k)
A Bloom filter
- Mask Size: 100
- Number of Hashes: 50
julia> ety=readcsv("etykiety_kli.txt")
56522x1 Array{Any,2}:
"EC00113876"
"EC00085985"
"EC00037297"
"EC00005413"
"EC00126328"
"EC00021867"
"EC00114062"
"EC00007751"
"EC00206892"
"EC00115609"
?
"EC00159409"
"EC00172340"
"EC00062096"
"EC00134183"
"EC00108009"
"EC00050665"
"EC00081817"
"EC00155357"
"EC00031904"
"EC00060934"
julia> add!(filter, ety)
julia> x=ety[2]
"EC00085985"
julia> contains(filter,x)
false
julia> findfirst(ety,x)
2
ety is Array{Any,2}: ...
julia> add!(filter, ety)
why :
julia> contains(filter,ety)
true
???
julia> filter
A Bloom filter
- Mask Size: 100
- Number of Hashes: 50
julia> ety
56522x1 Array{Any,2}:
"EC00113876"
"EC00085985"
"EC00037297"
"EC00005413"
"EC00126328"
"EC00021867"
"EC00114062"
"EC00007751"
"EC00206892"
"EC00115609"
?
"EC00159409"
"EC00172340"
"EC00062096"
"EC00134183"
"EC00108009"
"EC00050665"
"EC00081817"
"EC00155357"
"EC00031904"
"EC00060934"
julia> contains(filter,vec(ety))
56522-element BitArray{1}:
true
true
true
true
true
true
true
true
true
true
?
true
true
true
true
true
true
true
true
true
true
julia>
after vec
julia> contains(filter,"somethink")
true
:/
Ok. I'll look into this. It might take me a couple of weeks.
is better if data is no to long add!(filter,ety[1:50]) and when n is hi +-10000:
julia> using BloomFilters
julia> n, k = 100, 50
(100,50)
julia> filter = BloomFilter(n, k)
A Bloom filter
- Mask Size: 100
- Number of Hashes: 50
julia> ety=readcsv("etykiety_kli.txt")
56522x1 Array{Any,2}:
"EC00113876"
"EC00085985"
"EC00037297"
"EC00005413"
"EC00126328"
"EC00021867"
"EC00114062"
"EC00007751"
"EC00206892"
"EC00115609"
?
"EC00159409"
"EC00172340"
"EC00062096"
"EC00134183"
"EC00108009"
"EC00050665"
"EC00081817"
"EC00155357"
"EC00031904"
"EC00060934"
julia> add!(filter,ety[1:50])
julia> x=ety[2]
"EC00085985"
julia> contains(filter,x)
true
julia> findfirst(ety,x)
2
julia> contains(filter,ety[1])
true
julia> contains(filter,ety[50])
true
julia> contains(filter,ety[51])
true
julia> contains(filter,ety[55])
false
julia> contains(filter,ety[505])
true
julia> using BloomFilters
julia> n, k = 10000, 50
(10000,50)
julia> filter = BloomFilter(n, k)
A Bloom filter
- Mask Size: 10000
- Number of Hashes: 50
julia> ety=readcsv("etykiety_kli.txt")
56522x1 Array{Any,2}:
"EC00113876"
"EC00085985"
"EC00037297"
"EC00005413"
"EC00126328"
"EC00021867"
"EC00114062"
"EC00007751"
"EC00206892"
"EC00115609"
?
"EC00159409"
"EC00172340"
"EC00062096"
"EC00134183"
"EC00108009"
"EC00050665"
"EC00081817"
"EC00155357"
"EC00031904"
"EC00060934"
julia> add!(filter,ety[1:50])
julia> x=ety[2]
"EC00085985"
julia> contains(filter,x)
true
julia> findfirst(ety,x)
2
julia> contains(filter,ety[505])
false
julia> contains(filter,ety[50])
true
julia> contains(filter,ety[51])
false
julia> contains(filter,ety[52])
false
julia> contains(filter,ety[53])
false
julia> contains(filter,ety[55])
false
julia> contains(filter,ety[49])
true
julia> contains(filter,ety[48])
true
julia> contains(filter,ety[47])
true
julia>
must be ety[:]
OLD:
julia> filter = BloomFilter(n, k)
A Bloom filter
- Mask Size: 1000
- Number of Hashes: 50
julia> add!(filter,ety)
julia> x=ety[2]
"EC00085985"
julia> contains(filter,x)
false
NEW:
julia> filter = BloomFilter(n, k)
A Bloom filter
- Mask Size: 1000
- Number of Hashes: 50
julia> add!(filter,ety[:])
julia> x=ety[2]
"EC00085985"
julia> contains(filter,x)
true
Try again with 0.1.0