Some words in spanish dictionary don't even match themselves
silva96 opened this issue · comments
PostgreSQL version: 13
Ruby version: 2.7.2
gem version: 2.3.5
context "with the spanish dictionary" do
before do
ModelWithPgSearch.pg_search_scope :search_content_with_spanish,
against: :content,
using: {
tsearch: { dictionary: :spanish }
}
end
it "returns rows that match the query when stemmed by the spanish dictionary" do
included = [ModelWithPgSearch.create!(content: "saltar"),
ModelWithPgSearch.create!(content: "salté"),
ModelWithPgSearch.create!(content: "saltando")]
results = ModelWithPgSearch.search_content_with_spanish("saltar")
expect(results).to match_array(included)
end
it "returns rows that match the query when stemmed by the spanish dictionary" do
included = [ModelWithPgSearch.create!(content: "pedir"),
ModelWithPgSearch.create!(content: "pedido")]
results = ModelWithPgSearch.search_content_with_spanish("pedido")
expect(results).to match_array(included)
end
it "returns rows that match the exact query spanish dictionary" do
included = [ModelWithPgSearch.create!(content: "sentir"),
ModelWithPgSearch.create!(content: "sentido")]
results = ModelWithPgSearch.search_content_with_spanish("sentido")
expect(results).to match_array(included)
end
end
The last test fails, but it's weird, because it is not only not stemming, but is not even matching the exact query.
From the examples,
SELECT * FROM ts_debug('spanish', 'pedidos');
SELECT * FROM ts_debug('spanish', 'sentidos');
added a stackoverflow question because this looks like a postgresql bug
Closing as there is nothing we can do on the pg_search gem side to address this issue.
Please look into the stop word issue mentioned in this Stack Overflow comment.
Good luck!
Just to add my two cents, here's my workaround:
Having this configuration for search, I dynamically select the dictionary with that service class:
pg_search_scope :full_text_search, lambda { |query, locale|
{
against: {
cached_tag_list: 'A',
title: 'B',
plain_description: 'C'
},
using: {
tsearch: { dictionary: DictionarySelector.call(locale, query) }
},
ignoring: :accents,
query: query
}
}
inside it, I define each word I don't want to miss
EXCLUDED_STOPWORDS = {
es: %w[sentido estado]
}.freeze
then I decide wether I want the language specific or the simple dict
return SIMPLE_DICTIONARY if EXCLUDED_STOPWORDS[locale].to_a.any? { |word| query.include?(word) }
here the full class:
# frozen_string_literal: true
class DictionarySelector < ApplicationService
SIMPLE_DICTIONARY = 'simple'
LANGUAGES_MAP = {
ar: 'arabic', da: 'danish', nl: 'dutch', en: 'english', fi: 'finnish', fr: 'french', de: 'german',
hu: 'hungarian', id: 'indonesian', ga: 'irish', it: 'italian', lt: 'lithuanian', ne: 'nepali',
no: 'norwegian', pt: 'portuguese', ro: 'romanian', es: 'spanish', sv: 'swedish', ta: 'tamil', tr: 'turkish'
}.freeze # List_of_ISO_639-1_codes
EXCLUDED_STOPWORDS = {
es: %w[sentido estado]
}.freeze
def initialize(locale, query)
@locale = locale&.to_sym
@query = query
end
private
attr_reader :locale, :query
def perform
return SIMPLE_DICTIONARY unless locale
return SIMPLE_DICTIONARY if EXCLUDED_STOPWORDS[locale].to_a.any? { |word| query.include?(word) }
LANGUAGES_MAP[locale] || SIMPLE_DICTIONARY
end
end