FR: expressions/grouping in searches
mm12 opened this issue · comments
A widely-requested feature. I might tinker with app/logical/tag_query.rb
myself to get partway there. Here are some notes:
- Avoiding conflicts with tag names containing grouping characters. Some options:
- Require space around grouping:
(~A ~B)
does not work, but ( ~A ~B )` does - Restrict tag characters (such as disallowing
{
and}
and use them for grouping - Ignore the problem and hope nothing breaks (not recommended)
- Require space around grouping:
- danbooru has an implementation of this in
def expr
- Should modifiers on grouping be allowed?
-(A B C)
=(-A -B -C)
or~(A B)
=(~A ~B)
- but what about-(~A ~B)
(ie - has to be missing one of A or B)
why: Currently, OR
ing and NOT
ing things is query-wide, which is a huge pain. Using grouping means you can do things like (~A ~B) (~C ~D) E
, which require multiple queries using the current system.
edit: notes:
ElasticPostQueryBuilder.new(query).build
Ah yes, the feature request that's nearly as old as e621 itself finally making its way to the GitHub issues. It may forever remain a pipe dream, but we can always hope.
- Avoiding conflicts with tag names containing grouping characters.
- Require space around grouping: (~A ~B) does not work, but ( ~A ~B )` does
Would the easiest way be to just not allow (
to be the first character in a tag? There's currently only a handful that would violate this, and most are invalid anyway. Tags can obviously still end with )
but that would be easier to handle since we know where the grouping starts.
There might be some issues with stray parentheses in tags, but I don't think any of those should actually be valid either. A rule could probably be added to the tag validator to prevent any future tags with stray parentheses being created, e.g. person_character)
.
It would be easy to break down the groups from this string
(a_(character) b) (~b_(artist) ~c_(species)) (~pony_(mlp) ~pony_(eg) -pony)
to the extracted result
(a_(character) b)
(~b_(artist) ~c_(species))
(~pony_(mlp) ~pony_(eg) -pony)
Below is the code that gave that result. I wouldn't recommend actually using it as I threw it together pretty sloppily, only to be provided as a proof-of-concept.
def handle_tag_groups(input)
tag_groups = []
current_group = ""
nesting_level = 0
input.chars.each do |char|
nesting_level += 1 if char == '('
nesting_level -= 1 if char == ')'
current_group += char
if nesting_level == 0 && char == ')'
tag_groups << current_group.strip
current_group = ""
end
end
tag_groups
end
tag_groups = handle_tag_groups("(a_(character) b) (~b_(artist) ~c_(species)) (~pony_(mlp) ~pony_(eg) -pony)")
tag_groups.each_with_index do |tag_group|
puts tag_group
end
- danbooru has an implementation of this in
def expr
...this is why I should fully read the issue before starting to do anything. I imagine if Danbooru has this implemented they've already solved the above problem? I imagine we're too far diverged at this point to just pull this though.
Danbooru has a restriction around parenthesis, they have to match in the tag opening and closing, you can't have unbalanced parenthesis
a_(b)
- valid
a_b)
- invalid
a(
- invalid
https://github.com/danbooru/danbooru/blob/0ec753c5148011bba5763cdef66abaaee9aef086/app/models/tag.rb#L7
https://github.com/danbooru/danbooru/blob/0ec753c5148011bba5763cdef66abaaee9aef086/app/logical/tag_name_validator.rb#L21
https://github.com/danbooru/danbooru/blob/0ec753c5148011bba5763cdef66abaaee9aef086/config/initializers/core_extensions.rb#L88
Both of these are good ways to do this, but just using another set characters that is already disallowed in tag names or something would entirely bypass any issues with it.
It is also worth noting that in app/logical/tag_name_validator.rb
line 32, we specify that tags cannot begin with many of these characters
grouping with anything but parenthesis will be extremely unintuitive
grouping with anything but parenthesis will be extremely unintuitive
so is using ~
as the or operator imo.
What do you propose we group with? Percent signs? None of the characters that aren't allowed in tags are good for grouping things together
e621ng/app/logical/tag_name_validator.rb
Lines 7 to 18 in 9b8344d
What do you propose we group with? Percent signs? None of the characters that aren't allowed in tags are good for grouping things together
e621ng/app/logical/tag_name_validator.rb
Lines 7 to 18 in 9b8344d
I actually think {}
would be good, it very rarely used in tags (1 result on e6), and still makes sense to use. Something like this would work:
master...mm12:e621ng:grouping
OK, so here is what I am seeing what the code does:
The query_string
from the search enters ElasticPostQueryBuilder
-- (eg "A -B ~C ~D")
TagQuery is called with query_string
- query_string
is parsed into being either must
(ANDed), should
(ORed), or must_not
(NOTed). -- (eg, {"q":{"tags":{"must":["A"],"must_not":["B"],"should":["C","D"]}, "status_must_not":"deleted", "resolve_aliases":true, "tag_count":4}
)
ElasticPostQueryBuilder
then passes this to ElasticQueryBuilder
(superclass) which turns these into an actual elastic search:
{"query":
{"bool":
{
"must":[{"term":{"tags":"A"}}],
"must_not":[{"term":{"tags":"B"}}],
"should":[
{"term":{"tags":"C"}},
{"term":{"tags":"D"}}
],
"minimum_should_match":1
},
"sort":[{"id":"desc"}],
"_source":false,
"timeout":"9000ms"
}
Basically, to implement grouping in elasticsearch, you need for this to be recursive:
{"query":
{"bool":
{
"must":[{
"bool":{"must":[...],....
}],
"must_not":[...],
"should":[...],
"minimum_should_match":1
},
...
}
for any parts you want to group.
I believe "boost" can be used to also make posts that match more groups be "more important" - useful for tiebreaker on orders?
First, I want to preface this with: I am not a ruby dev. Until like last week, I never put any thought into ruby... ever.
However, I implemented this on my database mirror using javascript, though I would like to see this eventually in the main site. So, I've gotten started on my fork here: https://github.com/DontTalkToMeThx/e621ng/tree/tag-syntax
It follows the syntax of my existing advanced search syntax which is detailed here. Due to this being a (massively) breaking change to the syntax, I wish for it to be a setting users can enable in their settings and passed along as a query param for the API.
It does not currently support metatags, it only works with just searching tags right now (idk if wildcards work). I do not want to sink too much time into something that won't ever make its way on to the site, so I did this as a proof of concept. It not currently optimized, again I am NOT a ruby dev. If you wish to test it, you need to add &use_new_syntax=true
to your query string in the URL bar as I have no idea how to add a persistent setting at the moment.
It is identical to my other syntax, so you can use groups within groups, negation, etc. This is an example: ( female ~ male ) ( solo ~ duo )
would find posts that contain a female or male, and are either solo or duo. Which is currently not possible with the current syntax due to not having grouping. If you want to test this in your local version, this is the URL I use: http://localhost:3000/posts?tags=%28+female+~+male+%29+%28+solo+~+duo+%29&use_new_syntax=true
I wish for this to be used as a starting point for further development, mainly to show that this is possible, and I think we should eventually have some kind of grouping. I will make this fully feature complete with metatags, ordering, etc if it's deemed something that would actually be considered for the site.
My full implementation (in js) is open source as well, which is mainly what I'm kinda moving over to here. One of the main differences is that the builder is made without the intent of the grouping since a lot of the meta tags are top level only, so I'll need to implement my meta tag parser to inject the meta query into the correct location, which shouldn't be too hard.
Booru on Rails has a relatively feature-complete search parser which generates Elasticsearch queries, and has accompanying tests:
https://github.com/derpibooru/booru-on-rails/blob/master/lib/search_parser.rb
https://github.com/derpibooru/booru-on-rails/blob/master/test/lib/search_parser_test.rb
While I would advise against using it directly (due to licensing issues, and to be honest it isn't tailored to the needs of this software), there shouldn't be any reason you couldn't use a similar grammar.