Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search

Home Page:http://www.casebook.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using generated columns instead of triggers

ClayShentrup opened this issue · comments

Hi. I think I did this correctly, but our searches are awfully slow and I'm trying to verify that I've done things correctly. Here are the indexes.

execute(<<-SQL.squish)
  ALTER TABLE pg_search_documents
  ADD COLUMN tsvector_content_dmetaphone tsvector GENERATED ALWAYS AS (
    to_tsvector('simple', pg_search_dmetaphone(coalesce("pg_search_documents"."content"::text, '')))
  ) STORED;
SQL
add_index(:pg_search_documents, :tsvector_content_dmetaphone, using: :gin)

execute(<<-SQL.squish)
  ALTER TABLE pg_search_documents
  ADD COLUMN tsvector_content_tsearch tsvector GENERATED ALWAYS AS (
    to_tsvector('english', coalesce("pg_search_documents"."content"::text, ''))
  ) STORED;
SQL
add_index(:pg_search_documents, :tsvector_content_tsearch, using: :gin)

Here's the initialization.

PgSearch.multisearch_options = {
  ranked_by: ':tsearch + :dmetaphone',
  using: {
    dmetaphone: {
      tsvector_column: 'tsvector_content_dmetaphone',
    },
    tsearch: {
      dictionary: 'english',
      tsvector_column: 'tsvector_content_tsearch',
      highlight: {
        StartSel: '<strong>',
        StopSel: '</strong>',
      },
    },
  },
}

But with only ~40,000 records, it's still quite slow with dmetaphone. Tsearch seems fast.

deaqm92vildrji=> explain analyze SELECT Count(*) from "pg_search_documents" WHERE "pg_search_documents"."tsvector_content_tsearch" @@ to_tsquery('english', ''' ' || 'DNA' || ' ''');
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=324.74..324.74 rows=1 width=8) (actual time=2.210..2.211 rows=1 loops=1)
   ->  Bitmap Heap Scan on pg_search_documents  (cost=12.32..324.63 rows=206 width=0) (actual time=1.006..1.762 rows=9010 loops=1)
         Recheck Cond: (tsvector_content_tsearch @@ '''dna'''::tsquery)
         Heap Blocks: exact=621
         ->  Bitmap Index Scan on index_pg_search_documents_on_tsvector_content_tsearch  (cost=0.00..12.31 rows=206 width=0) (actual time=0.937..0.937 rows=9010 loops=1)
               Index Cond: (tsvector_content_tsearch @@ '''dna'''::tsquery)
 Planning Time: 0.166 ms
 Execution Time: 2.255 ms
(8 rows)
deaqm92vildrji=> explain analyze SELECT Count(*) from "pg_search_documents" WHERE "pg_search_documents"."tsvector_content_dmetaphone" @@ to_tsquery('simple', ''' ' || Pg_search_dmetaphone('DNA') || ' ''');
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1509.91..1509.91 rows=1 width=8) (actual time=71900.280..71900.282 rows=1 loops=1)
   ->  Seq Scan on pg_search_documents  (cost=0.00..1494.04 rows=31747 width=0) (actual time=52.202..71877.651 rows=31740 loops=1)
         Filter: (tsvector_content_dmetaphone @@ '''tn'''::tsquery)
         Rows Removed by Filter: 9413
 Planning Time: 1.194 ms
 Execution Time: 71901.433 ms
(6 rows)

You're right, you definitely don't want to see that Seq Scan. I'm not sure why it's not using your index.