HuangQiang / RQALSH_Mem

Memory Version of RQALSH for Furthest Neighbor Search (TKDE 2017)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question to RQALSH's result]

JieFengWang opened this issue · comments

Hi Dr. Huang, I have run RQALSH and RQALSH_Mem with the SIFT1M dataset separately on the same machine.
I have two questions about the results.

  • One is why the avg_ratio@10 looks like a wrong number under both two methods.
  • The other is why the in-memory version can find a better result than the on-disk version (ie. higher recall and lower avg_ratio).
    Below is the result.
(base) ➜ jfwang@HP  ~/proj/help_fns/RQALSH/methods git:(master) ✗ ./rqalsh -alg 4 -qn 10000 -d 128 -qs /home/jfwang/data/sift1m/sift1m_query.rawfbin -ts /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin  -df /home/jfwang/data/sift1m/fns/ -of /home/jfwang/data/sift1m/fns/
  alg           = 4
  qn            = 10000
  d             = 128
  query_set     = /home/jfwang/data/sift1m/sift1m_query.rawfbin
  truth_set     = /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin
  data_folder   = /home/jfwang/data/sift1m/fns/
  output_folder = /home/jfwang/data/sift1m/fns/

Read Data: 0.002825 Seconds

Read Ground Truth: 0.034793 Seconds

Parameters of RQALSH:
    n     = 1000000
    d     = 128
    B     = 4096
    beta  = 0.000100
    delta = 0.490000
    ratio = 2.0
    w     = 1.359556
    m     = 77
    l     = 33
    path  = /home/jfwang/data/sift1m/fns/indices/

Top-k FN Search by RQALSH: 
  Top-k         Ratio           I/O             Time (ms)       Recall
    1           1.0490          15397           87.09           3.67%
    2           1.0473          15616           88.14           4.65%
    5           1.0470          15882           89.77           4.90%
   10           -5192312821191349820094119149568.0000           16070           90.90           4.89%

 

(base) ➜ jfwang@HP  ~/proj/help_fns/RQALSH_Mem git:(master) ✗ ./rqalsh -alg 4 -n 1000000 -qn 10000 -d 128 -c 2.0 -ds /home/jfwang/data/sift1m/sift1m_base.rawfbin -qs /home/jfwang/data/sift1m/sift1m_query.rawfbin -ts /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin -op /home/jfwang/data/sift1m/fns/sift1m_searched_res        
alg       = 4
n         = 1000000
qn        = 10000
d         = 128
c         = 2.0
data_set  = /home/jfwang/data/sift1m/sift1m_base.rawfbin
query_set = /home/jfwang/data/sift1m/sift1m_query.rawfbin
truth_set = /home/jfwang/data/sift1m/fns/sift1m_gt_genby_rqlsh.rawfbin
out_path  = /home/jfwang/data/sift1m/fns/sift1m_searched_res

Read Data:  0.235536 Seconds
Read Query: 0.002316 Seconds
Read Truth: 0.029369 Seconds

Parameters of RQALSH:
    n     = 1000000
    d     = 128
    ratio = 2.0
    w     = 1.359556
    m     = 77
    l     = 33

Indexing Time = 12.757415 Seconds
Memory = 587.501038 MB

Top-k FN Search of RQALSH:
Top-k           Ratio           Time (ms)       Recall (%)      Fraction (%)
  1             1.0355          74.6545         5.21%           0.01%
  2             1.0368          74.6928         6.28%           0.01%
  5             1.0399          74.7599         6.17%           0.01%
 10             -5182759284901845213041003593728.0000           74.9524         5.90%           0.01%

Additional question: why when I increase the size of candidate, the time cost of avg query process does not increase like candidate size?

# in `def.h`
const int   CANDIDATES    = 2048;           
// const int   CANDIDATES    = 1024;       
// const int   CANDIDATES    = 256;          
// const int   CANDIDATES    = 100;                   

here is the searched result:

candidiate_size = 100 
Top-k FN Search of RQALSH:
Top-k           Ratio           Time (ms)       Recall (%)      Fraction (%)
  1             1.0345          74.7528         6.50%           0.01%



candidiate_size = 256
Top-k FN Search of RQALSH:
Top-k           Ratio           Time (ms)       Recall (%)      Fraction (%)
  1             1.0311          72.4820         7.96%           0.03%


candidiate_size = 1024
Top-k FN Search of RQALSH:
Top-k           Ratio           Time (ms)       Recall (%)      Fraction (%)
  1             1.0265          65.3057         10.28%          0.10%