Casecommons / pg_search

pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search

Home Page:http://www.casebook.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some words in spanish dictionary don't even match themselves

silva96 opened this issue · comments

PostgreSQL version: 13
Ruby version: 2.7.2
gem version: 2.3.5

context "with the spanish dictionary" do
        before do
          ModelWithPgSearch.pg_search_scope :search_content_with_spanish,
                                            against: :content,
                                            using: {
                                              tsearch: { dictionary: :spanish }
                                            }
        end

        it "returns rows that match the query when stemmed by the spanish dictionary" do
          included = [ModelWithPgSearch.create!(content: "saltar"),
                      ModelWithPgSearch.create!(content: "salté"),
                      ModelWithPgSearch.create!(content: "saltando")]

          results = ModelWithPgSearch.search_content_with_spanish("saltar")
          expect(results).to match_array(included)
        end

        it "returns rows that match the query when stemmed by the spanish dictionary" do
          included = [ModelWithPgSearch.create!(content: "pedir"),
                      ModelWithPgSearch.create!(content: "pedido")]

          results = ModelWithPgSearch.search_content_with_spanish("pedido")
          expect(results).to match_array(included)
        end

        it "returns rows that match the exact query spanish dictionary" do
          included = [ModelWithPgSearch.create!(content: "sentir"),
                      ModelWithPgSearch.create!(content: "sentido")]

          results = ModelWithPgSearch.search_content_with_spanish("sentido")
          expect(results).to match_array(included)
        end
      end

The last test fails, but it's weird, because it is not only not stemming, but is not even matching the exact query.

From the examples,

SELECT * FROM ts_debug('spanish', 'pedidos');

image

SELECT * FROM ts_debug('spanish', 'sentidos');

image

Closing as there is nothing we can do on the pg_search gem side to address this issue.

Please look into the stop word issue mentioned in this Stack Overflow comment.

Good luck!

Just to add my two cents, here's my workaround:

Having this configuration for search, I dynamically select the dictionary with that service class:

pg_search_scope :full_text_search, lambda { |query, locale|
    {
      against: {
        cached_tag_list: 'A',
        title: 'B',
        plain_description: 'C'
      },
      using: {
        tsearch: { dictionary: DictionarySelector.call(locale, query) }
      },
      ignoring: :accents,
      query: query
    }
  }

inside it, I define each word I don't want to miss

  EXCLUDED_STOPWORDS = {
    es: %w[sentido estado]
  }.freeze

then I decide wether I want the language specific or the simple dict

return SIMPLE_DICTIONARY if EXCLUDED_STOPWORDS[locale].to_a.any? { |word| query.include?(word) }

here the full class:

# frozen_string_literal: true

class DictionarySelector < ApplicationService
  SIMPLE_DICTIONARY = 'simple'
  LANGUAGES_MAP = {
    ar: 'arabic', da: 'danish', nl: 'dutch', en: 'english', fi: 'finnish', fr: 'french', de: 'german',
    hu: 'hungarian', id: 'indonesian', ga: 'irish', it: 'italian', lt: 'lithuanian', ne: 'nepali',
    no: 'norwegian', pt: 'portuguese', ro: 'romanian', es: 'spanish', sv: 'swedish', ta: 'tamil', tr: 'turkish'
  }.freeze # List_of_ISO_639-1_codes

  EXCLUDED_STOPWORDS = {
    es: %w[sentido estado]
  }.freeze

  def initialize(locale, query)
    @locale = locale&.to_sym
    @query = query
  end

  private

  attr_reader :locale, :query

  def perform
    return SIMPLE_DICTIONARY unless locale
    return SIMPLE_DICTIONARY if EXCLUDED_STOPWORDS[locale].to_a.any? { |word| query.include?(word) }

    LANGUAGES_MAP[locale] || SIMPLE_DICTIONARY
  end
end