[Question to RQALSH's result]
JieFengWang opened this issue · comments
Hi Dr. Huang, I have run RQALSH
and RQALSH_Mem
with the SIFT1M dataset separately on the same machine.
I have two questions about the results.
- One is why the
avg_ratio@10
looks like a wrong number under both two methods. - The other is why the in-memory version can find a better result than the on-disk version (ie. higher recall and lower avg_ratio).
Below is the result.
(base) ➜ jfwang@HP ~/proj/help_fns/RQALSH/methods git:(master) ✗ ./rqalsh -alg 4 -qn 10000 -d 128 -qs /home/jfwang/data/sift1m/sift1m_query.rawfbin -ts /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin -df /home/jfwang/data/sift1m/fns/ -of /home/jfwang/data/sift1m/fns/
alg = 4
qn = 10000
d = 128
query_set = /home/jfwang/data/sift1m/sift1m_query.rawfbin
truth_set = /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin
data_folder = /home/jfwang/data/sift1m/fns/
output_folder = /home/jfwang/data/sift1m/fns/
Read Data: 0.002825 Seconds
Read Ground Truth: 0.034793 Seconds
Parameters of RQALSH:
n = 1000000
d = 128
B = 4096
beta = 0.000100
delta = 0.490000
ratio = 2.0
w = 1.359556
m = 77
l = 33
path = /home/jfwang/data/sift1m/fns/indices/
Top-k FN Search by RQALSH:
Top-k Ratio I/O Time (ms) Recall
1 1.0490 15397 87.09 3.67%
2 1.0473 15616 88.14 4.65%
5 1.0470 15882 89.77 4.90%
10 -5192312821191349820094119149568.0000 16070 90.90 4.89%
(base) ➜ jfwang@HP ~/proj/help_fns/RQALSH_Mem git:(master) ✗ ./rqalsh -alg 4 -n 1000000 -qn 10000 -d 128 -c 2.0 -ds /home/jfwang/data/sift1m/sift1m_base.rawfbin -qs /home/jfwang/data/sift1m/sift1m_query.rawfbin -ts /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin -op /home/jfwang/data/sift1m/fns/sift1m_searched_res
alg = 4
n = 1000000
qn = 10000
d = 128
c = 2.0
data_set = /home/jfwang/data/sift1m/sift1m_base.rawfbin
query_set = /home/jfwang/data/sift1m/sift1m_query.rawfbin
truth_set = /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin
out_path = /home/jfwang/data/sift1m/fns/sift1m_searched_res
Read Data: 0.235536 Seconds
Read Query: 0.002316 Seconds
Read Truth: 0.029369 Seconds
Parameters of RQALSH:
n = 1000000
d = 128
ratio = 2.0
w = 1.359556
m = 77
l = 33
Indexing Time = 12.757415 Seconds
Memory = 587.501038 MB
Top-k FN Search of RQALSH:
Top-k Ratio Time (ms) Recall (%) Fraction (%)
1 1.0355 74.6545 5.21% 0.01%
2 1.0368 74.6928 6.28% 0.01%
5 1.0399 74.7599 6.17% 0.01%
10 -5182759284901845213041003593728.0000 74.9524 5.90% 0.01%
Additional question: why when I increase the size of candidate, the time cost of avg query process does not increase like candidate size?
# in `def.h`
const int CANDIDATES = 2048;
// const int CANDIDATES = 1024;
// const int CANDIDATES = 256;
// const int CANDIDATES = 100;
here is the searched result:
candidiate_size = 100
Top-k FN Search of RQALSH:
Top-k Ratio Time (ms) Recall (%) Fraction (%)
1 1.0345 74.7528 6.50% 0.01%
candidiate_size = 256
Top-k FN Search of RQALSH:
Top-k Ratio Time (ms) Recall (%) Fraction (%)
1 1.0311 72.4820 7.96% 0.03%
candidiate_size = 1024
Top-k FN Search of RQALSH:
Top-k Ratio Time (ms) Recall (%) Fraction (%)
1 1.0265 65.3057 10.28% 0.10%