eulerto / pg_similarity

set of functions and operators for executing similarity queries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gin problems

jhellerstein opened this issue · comments

It appears that the gin opclasses don't guarantee correct results:

joe=# create index aix on foo using gin(a gin_similarity_ops);
CREATE INDEX
joe=# select a, b, lev(a,b) from foo, bar where a ~== b;
             a             |          b           |        lev        
---------------------------+----------------------+-------------------
 Euler Taveira de Oliveira | Euler T. de Oliveira |              0.76
 Euler                     | Euller               | 0.833333333333333
(2 rows)

joe=# explain select a, b, lev(a,b) from foo, bar where a ~== b;
                           QUERY PLAN                           
----------------------------------------------------------------
 Nested Loop  (cost=0.00..122.43 rows=7 width=64)
   Join Filter: (foo.a ~== bar.b)
   ->  Seq Scan on bar  (cost=0.00..23.10 rows=1310 width=32)
   ->  Materialize  (cost=0.00..1.07 rows=5 width=32)
         ->  Seq Scan on foo  (cost=0.00..1.05 rows=5 width=32)
(5 rows)

joe=# show enable_seqscan;
 enable_seqscan 
----------------
 on
(1 row)

joe=# set enable_seqscan = 'off';
SET
joe=# explain select a, b, lev(a,b) from foo, bar where a ~== b;
                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop  (cost=10000000000.01..10000005300.97 rows=7 width=64)
   ->  Seq Scan on bar  (cost=10000000000.00..10000000023.10 rows=1310 width=32)
   ->  Bitmap Heap Scan on foo  (cost=0.01..4.02 rows=1 width=32)
         Recheck Cond: (a ~== bar.b)
         ->  Bitmap Index Scan on aix  (cost=0.00..0.01 rows=1 width=0)
               Index Cond: (a ~== bar.b)
(6 rows)

joe=# select a, b, lev(a,b) from foo, bar where a ~== b;
             a             |          b           | lev  
---------------------------+----------------------+------
 Euler Taveira de Oliveira | Euler T. de Oliveira | 0.76
(1 row)

joe=# set enable_seqscan = 'on';
SET
joe=# select a, b, lev(a,b) from foo, bar where a ~== b;
             a             |          b           |        lev        
---------------------------+----------------------+-------------------
 Euler Taveira de Oliveira | Euler T. de Oliveira |              0.76
 Euler                     | Euller               | 0.833333333333333
(2 rows)

joe=# \q
(joe@3fac) pg_similarity > 

It was an oversight. Fixed in e699efb. Problem was that unfortunately some operators can't use indexes. It seems soundex can use indexes but I left it for another commit.