Normalisation not applied when searching against tsv_document
doutatsu opened this issue · comments
I am using tsvector column, as outlined in the documentation. As I was comparing the old and new approach, I noticed that the pg rank was different and the difference was in the normalisation. Looking the the generated queries, it does look like normalisation is part of the query in both cases, yet results continue to be different. Changing normalisation on the regular search changes the results, but never on the new search.
original configuration
pg_search_scope :search,
against: { title: 'A', title_en: 'B', title_en_jp: 'C', alt_titles: 'D' },
using: { tsearch: { prefix: true, normalization: 2 } }
and generated query
SELECT
"titles".*
FROM
"titles"
INNER JOIN (
SELECT
"titles"."id" AS pg_search_id,
(
ts_rank(
(
setweight(
to_tsvector(
'simple',
coalesce(
"titles"."title" :: text, ''
)
),
'A'
) || setweight(
to_tsvector(
'simple',
coalesce(
"titles"."title_en" :: text,
''
)
),
'B'
) || setweight(
to_tsvector(
'simple',
coalesce(
"titles"."title_en_jp" :: text,
''
)
),
'C'
) || setweight(
to_tsvector(
'simple',
coalesce(
"titles"."alt_titles" :: text,
''
)
),
'D'
)
),
(
to_tsquery(
'simple', ''' ' || 'Test' || ' ''' || ':*'
)
),
2
)
) AS rank
FROM
"titles"
WHERE
(
(
setweight(
to_tsvector(
'simple',
coalesce(
"titles"."title" :: text, ''
)
),
'A'
) || setweight(
to_tsvector(
'simple',
coalesce(
"titles"."title_en" :: text,
''
)
),
'B'
) || setweight(
to_tsvector(
'simple',
coalesce(
"titles"."title_en_jp" :: text,
''
)
),
'C'
) || setweight(
to_tsvector(
'simple',
coalesce(
"titles"."alt_titles" :: text,
''
)
),
'D'
)
) @@ (
to_tsquery(
'simple', ''' ' || 'Test' || ' ''' || ':*'
)
)
)
) AS pg_search_d011be041e2abd01dae426 ON "titles"."id" = pg_search_d011be041e2abd01dae426.pg_search_id
ORDER BY
pg_search_d011be041e2abd01dae426.rank DESC,
"titles"."id" ASC
and here configuration and query using the tsvectors:
pg_search_scope :search,
against: :tsv_document,
using: {
tsearch: {
prefix: true,
normalization: 2,
tsvector_column: 'tsv_document',
}
}
SELECT
"titles".*
FROM
"titles"
WHERE
"titles"."id" IN (
SELECT
"title_searches"."titles_id"
FROM
"title_searches"
INNER JOIN (
SELECT
"title_searches"."titles_id" AS pg_search_id,
(
ts_rank(
(
"title_searches"."tsv_document"
),
(
to_tsquery(
'simple', ''' ' || 'Test' || ' ''' || ':*'
)
),
2
)
) AS rank
FROM
"title_searches"
WHERE
(
(
"title_searches"."tsv_document"
) @@ (
to_tsquery(
'simple', ''' ' || 'Test' || ' ''' || ':*'
)
)
)
) AS pg_search_581b082c20d9b9515ba8a4 ON "title_searches"."titles_id" = pg_search_581b082c20d9b9515ba8a4.pg_search_id
ORDER BY
pg_search_581b082c20d9b9515ba8a4.rank DESC,
"title_searches"."titles_id" ASC
)
Look closer. In both queries, the third argument to ts_rank
is 2. The normalization is being applied.
I am aware, that's why I mentioned:
Looking at the generated queries, it does look like normalisation is part of the query in both cases, yet results continue to be different
Meaning even though in both cases normalization is applied, the results are different, while they should be the same.
I have solved the problem a while ago, albeit I don't quite remember how at this point. So it's fine to close this ticket anyway
My educated guess is that you probably originally missed the A/B/C/D weighting when building the tsvector column. It's important to get the correct expression when defining a trigger or generated column to build a tsvector.
I'm glad that you were able to figure it out!