Using generated columns instead of triggers
ClayShentrup opened this issue · comments
Clay Shentrup commented
Hi. I think I did this correctly, but our searches are awfully slow and I'm trying to verify that I've done things correctly. Here are the indexes.
execute(<<-SQL.squish)
ALTER TABLE pg_search_documents
ADD COLUMN tsvector_content_dmetaphone tsvector GENERATED ALWAYS AS (
to_tsvector('simple', pg_search_dmetaphone(coalesce("pg_search_documents"."content"::text, '')))
) STORED;
SQL
add_index(:pg_search_documents, :tsvector_content_dmetaphone, using: :gin)
execute(<<-SQL.squish)
ALTER TABLE pg_search_documents
ADD COLUMN tsvector_content_tsearch tsvector GENERATED ALWAYS AS (
to_tsvector('english', coalesce("pg_search_documents"."content"::text, ''))
) STORED;
SQL
add_index(:pg_search_documents, :tsvector_content_tsearch, using: :gin)
Here's the initialization.
PgSearch.multisearch_options = {
ranked_by: ':tsearch + :dmetaphone',
using: {
dmetaphone: {
tsvector_column: 'tsvector_content_dmetaphone',
},
tsearch: {
dictionary: 'english',
tsvector_column: 'tsvector_content_tsearch',
highlight: {
StartSel: '<strong>',
StopSel: '</strong>',
},
},
},
}
But with only ~40,000 records, it's still quite slow with dmetaphone. Tsearch seems fast.
deaqm92vildrji=> explain analyze SELECT Count(*) from "pg_search_documents" WHERE "pg_search_documents"."tsvector_content_tsearch" @@ to_tsquery('english', ''' ' || 'DNA' || ' ''');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=324.74..324.74 rows=1 width=8) (actual time=2.210..2.211 rows=1 loops=1)
-> Bitmap Heap Scan on pg_search_documents (cost=12.32..324.63 rows=206 width=0) (actual time=1.006..1.762 rows=9010 loops=1)
Recheck Cond: (tsvector_content_tsearch @@ '''dna'''::tsquery)
Heap Blocks: exact=621
-> Bitmap Index Scan on index_pg_search_documents_on_tsvector_content_tsearch (cost=0.00..12.31 rows=206 width=0) (actual time=0.937..0.937 rows=9010 loops=1)
Index Cond: (tsvector_content_tsearch @@ '''dna'''::tsquery)
Planning Time: 0.166 ms
Execution Time: 2.255 ms
(8 rows)
deaqm92vildrji=> explain analyze SELECT Count(*) from "pg_search_documents" WHERE "pg_search_documents"."tsvector_content_dmetaphone" @@ to_tsquery('simple', ''' ' || Pg_search_dmetaphone('DNA') || ' ''');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1509.91..1509.91 rows=1 width=8) (actual time=71900.280..71900.282 rows=1 loops=1)
-> Seq Scan on pg_search_documents (cost=0.00..1494.04 rows=31747 width=0) (actual time=52.202..71877.651 rows=31740 loops=1)
Filter: (tsvector_content_dmetaphone @@ '''tn'''::tsquery)
Rows Removed by Filter: 9413
Planning Time: 1.194 ms
Execution Time: 71901.433 ms
(6 rows)
Grant Hutchins commented
You're right, you definitely don't want to see that Seq Scan. I'm not sure why it's not using your index.