GreptimeTeam / promql-parser

PromQL Rust parser

Repository from Github https://github.comGreptimeTeam/promql-parserRepository from Github https://github.comGreptimeTeam/promql-parser

Failing to parse valid regex

fbs opened this issue · comments

The following regex fails to parse in rust, but works with the official go parser:

uri=~"/v[1-9]/.*/{gid}/{uid}"

Reproduce

use promql_parser::parser;

fn main() {
    let promql = r#"rate(http_server_requests_seconds_bucket{method="GET",status=~"2..",uri=~"/v[1-9]/.*/{gid}/{uid}",le="0.25"}[1h])"#;

    match parser::parse(promql) {
        Ok(expr) => {
            println!("Prettify:\n\n{}", expr.prettify());
            println!("AST:\n{expr:?}");
        }
        Err(info) => println!("Err: {info:?}"),
    }
}
# `Err: "illegal regex for /v[1-9]/.*/{gid}/{uid}"`

The same query works with in go:

package main

import (
	"github.com/prometheus/prometheus/promql/parser"
)

func main() {
	_, err := parser.ParseExpr(`rate(http_server_requests_seconds_bucket{method="GET",status=~"2..",uri=~"/v[1-9]/.*/{gid}/{uid}",le="0.25"}[1h])`)

	if err != nil {
		panic(err)
	}
}

Maybe it's because the difference between regex implementation of Rust and Go. I will try to fix it.
Thanks for your report, and any PR is welcome.

hi @yuanbohan, wow faster answer than I expected <3. I was taking a quick look as well:

For reproducing:

use regex::Regex;
fn main() {
    let re = r#"/v[1-9]/.*/{gid}/{uid}"#;

    Regex::new(&re).unwrap();
}

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    /v[1-9]/.*/{gid}/{uid}
                ^
error: repetition quantifier expects a valid decimal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)', src/main.rs:20:21

Which makes sense, not sure what a good fix is here

promql

Just to verify that prometheus itself handles the regex as expected (or not expected), it shows series

Maybe you can try other symbols in your uri, like <, _?
Or just escape the curly brackets using \ in your regex?

let re = r#"/v[1-9]/.*/\{gid\}/\{uid\}"#;
let cap = Regex::new(&re).unwrap().captures("/v1/foo/{gid}/{uid}");
println!("{cap:?}");

==== output

Some(Captures({0: 0..19/"/v1/foo/{gid}/{uid}"}))

Is this what you want? @fbs

For a bit of background (shouldve added that earlier):

We run a shared prometheus platform used by many teams. I want to do some analysis of those rules to help teams with some company specifics (architecture). As its an all python team I'm using this library as pyo3 makes it really easy.

So changing the regex isn't an option, they come from end users and are valid for prometheus itself.

I guess its hard to fix as its a regex implementation difference between go and rust. If there was a 'less strict' regex flag I would be ok with that. What do you think?

I'm open to implementing something for it

I guess its hard to fix as its a regex implementation difference between go and rust. If there was a 'less strict' regex flag I would be ok with that. What do you think?

From the regex lib's doc I can't find any configuration to control such behavior 🙁

Not a regex expert, but what do you think of adding some "preprocess" to modify the input regex rule like @yuanbohan said:

Maybe you can try other symbols in your uri, like <, _? Or just escape the curly brackets using \ in your regex?

I'm not sure if this is viable...

I indeed ended up adding some preprocessing in #76. Its not the nicest implementation but it seems to work. Bit worried about some other edge cases I've missed.