Quoted searches with underscores return non-exact matches
murdo-moj opened this issue · comments
Hello, the DataHub demo instance's search appears to be broken. I am looking at an example from the docs:
If you want to:
- Exact match on term or phrase
- "datahub_schema" Sample results
- datahub_schema Sample results
- Enclosing one or more terms with double quotes will enforce exact matching on these terms, preventing further tokenization.
Both of the results here are the same with 393 results. The quotes aren't doing anything. Perhaps they are being stripped somewhere before the query is passed to elasticsearch?
Further context from some experimentation:
The underscore character does appear to be adding some wildcard functionality that spaces do not.
Here are some example searches in the demo instance and the number of results returned:
Search term | Demo Link | Number of results |
---|---|---|
"datahub_schema" |
(demo) | 393 |
datahub_schema |
(demo) | 393 |
"datahub schema" |
(demo) | 2 |
datahub schema |
(demo) | 42 |
datahub | schema |
(demo) | 393 |
datahub |
(demo) | 51 |
schema |
(demo) | 384 |
It appears as though the underscore character is forcing an 'OR' search for the words it separates, regardless of the presence of quotes (whereas a space character leads to an 'AND' search without quotes, and a 'EXACT' search with quotes)