How to search for a substring in a word
jsmartt opened this issue · comments
It is possible to configure pg_search to find a substring of a word? For example, if I have a column named fqdn
, and I want to search for a subdomain.
Here's what I have in my model right now:
pg_search_scope :search_for, against: column_names, using: { tsearch: { prefix: true, negation: true } }
Let's say a record exists with a fqdn of host1.site1.example.com
...
Model.search_for('host1')
returns the result properlyModel.search_for('site1')
returns nothing. I'd like it to return that record.
I haven't been able to come up with a configuration that works yet. Any help here would be much appreciated. Thanks!
This seems to be one of the most basic search requirements and I couldn't find a way to do it either so far.
negation: true
seems useless:
If you want to exclude certain words, you can set :negation to true. Then any term that begins with an exclamation point
!
will be excluded from the results
prefix: true
does not seem to promise you're striving to achieve:
full text search matches on whole words by default. If you want to search for partial words, however, you can set :prefix to true
It will search for partial words, but only those words that do have a prefix matching your search term.
tsearch
's capabilities seem to be limited in this regard, and you'll have to use trigram-based search, see https://stackoverflow.com/questions/2513501/postgresql-full-text-search-how-to-search-partial-words
Yes, tsearch is a bit limited for searching in the middle of a string.
You can use the ts_debug
SQL function if you want to figure out how PostgreSQL is parsing your text.
# SELECT * FROM ts_debug('simple', 'test.example.com');
alias | description | token | dictionaries | dictionary | lexemes
-------+-------------+------------------+--------------+------------+--------------------
host | Host | test.example.com | {simple} | simple | {test.example.com}
(1 row)
For example, the built-in simple
and english
parsers seems to recognize an entire hostname as a single lexeme. I believe this means you would need to match the entire string (or a prefix of it with prefix: true
) to match.
It may be possible to implement your own parser. You also may want to pre-process your text before indexing it. For example, you could split the hostname by .
and store that in a separate column.
See https://www.postgresql.org/docs/current/textsearch-debugging.html for more details. Essentially we are limited by what the database provides.