Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search

Home Page:http://www.casebook.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Normalisation not applied when searching against tsv_document

doutatsu opened this issue · comments

I am using tsvector column, as outlined in the documentation. As I was comparing the old and new approach, I noticed that the pg rank was different and the difference was in the normalisation. Looking the the generated queries, it does look like normalisation is part of the query in both cases, yet results continue to be different. Changing normalisation on the regular search changes the results, but never on the new search.

original configuration

  pg_search_scope :search,
                  against: { title: 'A', title_en: 'B', title_en_jp: 'C', alt_titles: 'D' },
                  using: { tsearch: { prefix: true, normalization: 2 } }

and generated query

SELECT 
  "titles".* 
FROM 
  "titles" 
  INNER JOIN (
    SELECT 
      "titles"."id" AS pg_search_id, 
      (
        ts_rank(
          (
            setweight(
              to_tsvector(
                'simple', 
                coalesce(
                  "titles"."title" :: text, ''
                )
              ), 
              'A'
            ) || setweight(
              to_tsvector(
                'simple', 
                coalesce(
                  "titles"."title_en" :: text, 
                  ''
                )
              ), 
              'B'
            ) || setweight(
              to_tsvector(
                'simple', 
                coalesce(
                  "titles"."title_en_jp" :: text, 
                  ''
                )
              ), 
              'C'
            ) || setweight(
              to_tsvector(
                'simple', 
                coalesce(
                  "titles"."alt_titles" :: text, 
                  ''
                )
              ), 
              'D'
            )
          ), 
          (
            to_tsquery(
              'simple', ''' ' || 'Test' || ' ''' || ':*'
            )
          ), 
          2
        )
      ) AS rank 
    FROM 
      "titles" 
    WHERE 
      (
        (
          setweight(
            to_tsvector(
              'simple', 
              coalesce(
                "titles"."title" :: text, ''
              )
            ), 
            'A'
          ) || setweight(
            to_tsvector(
              'simple', 
              coalesce(
                "titles"."title_en" :: text, 
                ''
              )
            ), 
            'B'
          ) || setweight(
            to_tsvector(
              'simple', 
              coalesce(
                "titles"."title_en_jp" :: text, 
                ''
              )
            ), 
            'C'
          ) || setweight(
            to_tsvector(
              'simple', 
              coalesce(
                "titles"."alt_titles" :: text, 
                ''
              )
            ), 
            'D'
          )
        ) @@ (
          to_tsquery(
            'simple', ''' ' || 'Test' || ' ''' || ':*'
          )
        )
      )
  ) AS pg_search_d011be041e2abd01dae426 ON "titles"."id" = pg_search_d011be041e2abd01dae426.pg_search_id 
ORDER BY 
  pg_search_d011be041e2abd01dae426.rank DESC, 
  "titles"."id" ASC

and here configuration and query using the tsvectors:

  pg_search_scope :search,
                  against: :tsv_document,
                  using: {
                    tsearch: {
                      prefix: true,
                      normalization: 2,
                      tsvector_column: 'tsv_document',
                    }
                  }
SELECT 
  "titles".*
FROM 
  "titles" 
WHERE 
  "titles"."id" IN (
    SELECT 
      "title_searches"."titles_id" 
    FROM 
      "title_searches" 
      INNER JOIN (
        SELECT 
          "title_searches"."titles_id" AS pg_search_id, 
          (
            ts_rank(
              (
                "title_searches"."tsv_document"
              ), 
              (
                to_tsquery(
                  'simple', ''' ' || 'Test' || ' ''' || ':*'
                )
              ), 
              2
            )
          ) AS rank 
        FROM 
          "title_searches" 
        WHERE 
          (
            (
              "title_searches"."tsv_document"
            ) @@ (
              to_tsquery(
                'simple', ''' ' || 'Test' || ' ''' || ':*'
              )
            )
          )
      ) AS pg_search_581b082c20d9b9515ba8a4 ON "title_searches"."titles_id" = pg_search_581b082c20d9b9515ba8a4.pg_search_id 
    ORDER BY 
      pg_search_581b082c20d9b9515ba8a4.rank DESC, 
      "title_searches"."titles_id" ASC
  )

Look closer. In both queries, the third argument to ts_rank is 2. The normalization is being applied.

I am aware, that's why I mentioned:

Looking at the generated queries, it does look like normalisation is part of the query in both cases, yet results continue to be different

Meaning even though in both cases normalization is applied, the results are different, while they should be the same.

I have solved the problem a while ago, albeit I don't quite remember how at this point. So it's fine to close this ticket anyway

My educated guess is that you probably originally missed the A/B/C/D weighting when building the tsvector column. It's important to get the correct expression when defining a trigger or generated column to build a tsvector.

I'm glad that you were able to figure it out!