datahub-project / datahub

The Metadata Platform for your Data Stack

Home Page:https://datahubproject.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quoted searches with underscores return non-exact matches

murdo-moj opened this issue · comments

commented

Hello, the DataHub demo instance's search appears to be broken. I am looking at an example from the docs:

If you want to:

  • Exact match on term or phrase
    • "datahub_schema" Sample results
    • datahub_schema Sample results
    • Enclosing one or more terms with double quotes will enforce exact matching on these terms, preventing further tokenization.

Both of the results here are the same with 393 results. The quotes aren't doing anything. Perhaps they are being stripped somewhere before the query is passed to elasticsearch?

Further context from some experimentation:

The underscore character does appear to be adding some wildcard functionality that spaces do not.
Here are some example searches in the demo instance and the number of results returned:

Search term Demo Link Number of results
"datahub_schema" (demo) 393
datahub_schema (demo) 393
"datahub schema" (demo) 2
datahub schema (demo) 42
datahub | schema (demo) 393
datahub (demo) 51
schema (demo) 384

It appears as though the underscore character is forcing an 'OR' search for the words it separates, regardless of the presence of quotes (whereas a space character leads to an 'AND' search without quotes, and a 'EXACT' search with quotes)