o19s / quepid

Improve your Elasticsearch, OpenSearch, Solr, Vectara, Algolia and Custom Search search quality.

Home Page:http://www.quepid.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix RRE export format

epugh opened this issue · comments

Describe the bug

We export rre as:

"relevant_documents": {
        "1.0": [
          "l_1559"
        ]
      }

But in talking to @jillesvangurp figured out that it should be...

 "relevant_documents_fixed": {
        "l_1559": {
          "gain" : 1.0
        }
      }

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Just for reference, I had a go at implementing an rre import for rankquest studio. The implementation lives here in rankquest-core:

https://github.com/jillesvangurp/rankquest-core/blob/main/src/commonMain/kotlin/com/jilesvangurp/rankquest/core/rre-support.kt

It's pretty easy to add more features there or alternative formats.

I based my implementation on the one example in the RRE repo that I was able to find:

https://github.com/SeaseLtd/rated-ranking-evaluator/blob/master/rre-core/src/test/resources/engine_evaluation_tests/ratings/ratings_example.json

I'm happy to do some more work on this but it would help to get some better examples to work with.

Other open questions is whether the rating should be an Int or a Double. I'm treating it as an Int so far.

Here is the new format for RRE:

{
    "id_field": "id",
    "index": "tmdb",
    "template": "template.json",
    "queries": [
        {
            "placeholders": {
                "$query": "=cmd"
            }
        },
        {
            "placeholders": {
                "$query": "First Query"
            }
        },
        {
            "placeholders": {
                "$query": "Second Query"
            },
            "relevant_documents": {
                "docb": {
                    "gain": 1
                },
                "doca": {
                    "gain": 3
                }
            }
        },
        {
            "placeholders": {
                "$query": "Third Query"
            }
        },
        {
            "placeholders": {
                "$query": "Fourth Query"
            }
        }
    ]
}

It's still a bit different from the format I linked above which has topics, query_groups, and then queries. Did you test the format with RRE?

so i don't have topics, and i thought some of that was optional.. honestly, i haven't tested it casue it's been a while that i've used RRE....

so, i wonder if i should just support the direct rankquest format instead?

depends, who else is using the export currently?

I modeled my importer after the one sample I found in their repo. But of course it would be nice if that lines up with what people actually are using and expecting currently.

Otherwise, I'm open to suggestions and in no way tied to the RRE format.

Okay, I think the better route to go is to introduce a RankQuest format.. that way each can evolve as market demand drives it.. Do you have an example of a export file I can use?

Here's an example:

movie-quotes-rated-searches-2024-01-22T16 56 59.954Z.json

You need a matching search plugin configuration that can handle the parameters. The parameter map (search context) is all strings. Comment and tag fields are optional.

the label is optional too but it's nice to have some hint what the document is about. Usually the document title would be appropriate.

size in the search context refers to how many results to fix, the rest is similar to the parameters in rre.