Query planer doesn't work for multiple alternations in a row
hsqStephenZhang opened this issue · comments
High level details
I added a unittest of plan("[ab][cd][ef]")
, which is a query parse testcase in google's code search project(I know it's pretty old), which is expected to expand into "ace, acf, ... bdf", but the test output failed to inline the query strings
Current situation
- how it's insufficient: After reading the code, I found that current optimizer won't handle the scenario that has over 3 alternations in a row. I'm not sure if it's in anticipation
- motivations: I'm learning code search techs and comparing google's code search engine's with bloop, the boolean query builder seems to have some diffs, so I'm figuring it out.
- why we need to change this: To make the query optimization more sufficient and cover more cases.
Proposal
I've implemented some changes, and have passed the test( plan("(a|b)(c|d)(e|f)(g|h)")
) added by my self.
in server/bleep/src/query/planner.rs
, The Fragment:::add
need to handle the following match arm:
// TODO: do we need to handle cases where the children are not all literals?
(Fragment::Literal(lit), Fragment::Dense(Op::Or, mut rhs)) => {
if rhs.iter().all(|x| matches!(x, Fragment::Literal(_))) {
for x in &mut rhs {
if let Fragment::Literal(s) = x {
*s = lit.clone() + s;
}
}
Fragment::Dense(Op::Or, rhs)
} else {
Fragment::Dense(
Op::And,
vec![Fragment::Literal(lit), Fragment::Dense(Op::Or, rhs)],
)
}
}
// TODO: avoid explosion by adding some constraints
// TODO: do we need to handle cases where the children are not all literals?
(Fragment::Dense(Op::Or, lhs), Fragment::Dense(Op::Or, rhs)) => {
if lhs.iter().all(|x| matches!(x, Fragment::Literal(_)))
&& lhs.len() <= 100
&& rhs.iter().all(|x| matches!(x, Fragment::Literal(_)))
&& rhs.len() <= 100
{
let mut cross = Vec::with_capacity(lhs.len() * rhs.len());
for x in &lhs {
for y in &rhs {
let x = x.as_literal().unwrap();
let y = y.as_literal().unwrap();
cross.push(Fragment::Literal(x.to_owned() + y));
}
}
Fragment::Dense(Op::Or, cross)
} else {
Fragment::Dense(
Op::And,
vec![Fragment::Dense(Op::Or, lhs), Fragment::Dense(Op::Or, rhs)],
)
}
}
and the function optimize::run
should call flattern_or
after the calling of inline
Next steps
Are there any next steps after we implement these changes?
Given the content of 'acgf', the regex query "[ab][cd][ef]" will recall this doc, and it's unexpected, since this regex query has a determined set of matched trigrams.