datahub-project / datahub

The Metadata Platform for your Data Stack

Home Page:https://datahubproject.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Search queries require all terms to match, or nothing is returned

MatMoore opened this issue · comments

commented

Describe the bug
If users enter search queries with multiple terms, the query must be extremely precise to return results. Datahub will not return matches unless all of the search terms are present.

To Reproduce
Steps to reproduce the behavior:

  1. Pick any entity in the catalogue
  2. Copy and paste some words from its description into search - it should show up in the search results
  3. Add or change a single term to something that doesn't match and then repeat the search - now nothing will be returned

A contrived example on the demo instance: This table has basic information about a customer, as well as some derived facts based on a customer's orders vs "This table has simple information about a customer, as well as some derived facts based on a customer's orders"

The behaviour is the same in both the React frontend and in the GraphQL API.

Expected behavior
Providing that quotes are not used around the search term, I would expect that only one term needs to match for an entity to be returned in the search results. Entities that match some but not all terms should be ranked lower but not excluded from the result set.

This is likely to be particularly problematic for users who are less sure of what they are looking for, tend towards natural language queries.

In our use case we are hoping to roll out the catalogue to a very diverse set of users, and there will be some user groups who work less closely with the data. These users would be impacted a lot if the search has low recall.

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: Chrome
  • Version: Tested in versions 0.13.1, 0.12.0

Additional context
Datahub has an exactMatch config setting, but this is defaulted to false, so this doesn't explain why we are seeing this exclusive behaviour.

      ## Configuration around exact matching for search
      exactMatch:
        ## if false will only apply weights, if true will exclude non-exact
        exclusive: false

This is also not part of Elasticsearch's simple query string:

For example, a query string of capital of Hungary is interpreted as capital
OR of OR Hungary.