Support for Tags/Properties in Rule Import

Question

Support for Tags/Properties in Rule Import

dobestler opened this issue 3 years ago · comments

As a Querqy user having already many rules I would like to migrate to SMUI. Some rules are annotated with various properties which need to be imported as well:

"notebook" =>
SYNONYM: laptop
@{
"property_1" : [ "abcd" ],
"property_2" : [ "10", "20", "30" ]
}@

When importing above rule, I would like to have the properties and their values associated in SMUI.

dobestler · Answer 1 · Tue May 04 2021 16:23:50 GMT+0800 (China Standard Time)

SMUI has a concept of predefined tags so that the SMUI User can tag rules using these tags.

@pbartusch What should be the strategy for the import in case of "unknown" tags?
a) Import only tags that match with predefined tags already existing in the SMUI Model. Unknown tags are returned in the HTTP Response (Import Statistics) so that the import-user knows, which tags were ignored.
b) Import all tags by creating unknown tags on the fly (and re-using predefined tags already existing in SMUI similar to a))

IMHO a) is the better way because the import has less side effects and is kept simple. Maybe b) could be another enhancement in case the import becomes a bigger use case. It could then become a configuration option for the import.

pbartusch · Answer 2 · Wed May 05 2021 00:08:43 GMT+0800 (China Standard Time)

Yes @dobestler , I agree with a), but I‘d rather tend to a more strict variant and abort the commit of the imported rules, in case any unexpected tags occur.

But there exist two different config setting:

No predefined tags -> basically all tags are allowed. The Search Manager defines them on-the-fly. As could the import. I don’t know any live setup, where SMUI is operated with this setup, but it still makes sense tobsupport that.
Predefined tags act as a „allow list“ -> This is standard when working with tags in SMUI.

dobestler · Answer 3 · Wed May 05 2021 03:45:47 GMT+0800 (China Standard Time)

Okay. I see these two config settings which seem to play a role then:

toggle.rule-tagging -- true or false
toggle.predefined-tags-file -- "" or path_to_json_file

Following your suggestion the import behaviour would be like this:

rule-tagging = false => tags are ignored by import
rule-tagging = true AND predefined-tags-file = "" => all tags are imported. Each tag is either newly created on the fly or reused if a definition already exists in the DB
rule-tagging = true AND predefined-tags-file is set => all tags are imported in case there is an existing definition in the DB. Import fails in case of unknown tags.

So this basically means my proposed option b) with the configuration option being above mentioned config settings. Cases 1. and 3. are simple while 2. might take more effort. I'll have a look at it.

@pbartusch How should the importer treat the querqy-internal properties @_log and @_id?

pbartusch · Answer 4 · Wed May 05 2021 04:10:57 GMT+0800 (China Standard Time)

Yes to all three cases above, @dobestler . Case 2. might be simple as well , because no predefined tag entry might be necessary (only a entry in the rule model) , but it depends on the implementation which I have not done personally.

I would ignore log and id for now (explicitly) , in case that's fine for your use case right now. id would have the potential to become the rule's id , but only after successful validation (like SMUI renders those in its output format). But that only accounts for SMUI rendered rules.txt. Maybe this becomes an GET-parameter option for the import route in the future ...

dobestler · Answer 5 · Fri May 07 2021 23:18:20 GMT+0800 (China Standard Time)

@pbartusch The importer currently consolidates different search inputs with synonym and identical rules into one definition with a so called undirected synonym. Should the tags also be part of that "equality check"?

Example:
notebook =>
SYNONYM: laptop
DOWN(10): something
@{
"tenant" : [ "a"],
"lang" : [ "de-CH", "en-US" ]
}@

laptop =>
SYNONYM: notebook
DOWN(10): something
@{
"tenant" : [ "a", "b" ],
"lang" : [ "de-CH", "en-US" ]
}@

If tags are part of the equality check, then the import would not consolidate above search inputs because the tenant properties are not identical. I think this is the safer (and strict) way. If we would consolidate such cases by ignoring the differences in tags, the question would then be which tags to use for the consolidated definition. What do you think?

pbartusch · Answer 6 · Sat May 08 2021 01:00:25 GMT+0800 (China Standard Time)

I am also in favor for the strict approach , as it might carry a different semantic having the same term in two rules with different tags.