PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

Home Page:https://prql-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alphanumeric parameter only works with numbers

kriswuollett opened this issue · comments

What happened?

Tried using the playground to test out an alphanumeric parameter such as $employee_id but it is not keeping the string unchanged in the generated SQL. Snippet from the docs saying that it should work:

Parameter is a placeholder for a value provided after the compilation of the
query.
It uses the following syntax: `$id`, where `id` is an arbitrary alpha numeric
string.
Most database engines only support numeric positional parameter ids (i.e `$3`).

Use case is generating a query for rusqlite, so I also believe that there is also the database library dimension when it comes to parameters: it is not necessarily relevant that the database dialect itself supports any parameter type.

CleanShot 2023-07-25 at 15 32 26@2x

PRQL input

prql target:sql.sqlite
from employees
filter id == $employee_id

SQL output

SELECT
  *
FROM
  employees
WHERE
  id = $ employee_id

-- Generated by PRQL compiler version:0.9.1 (https://prql-lang.org)

Expected SQL output

SELECT
  *
FROM
  employees
WHERE
  id = $employee_id

-- Generated by PRQL compiler version:0.9.1 (https://prql-lang.org)

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

No response

Caused by the SQL formatter. I don't think there is a non-hacky way to fix it, other than opening a PR to the upstream library.

@aljazerzen, I don't know if you would consider this hacky, but I think it may be possible to take advantage of the library's substitution parameters:

/// Formats whitespace in a SQL string to make it easier to read.
/// Optionally replaces parameter placeholders with `params`.
pub fn format(query: &str, params: &QueryParams, options: FormatOptions) -> String {
    let tokens = tokenizer::tokenize(query);
    formatter::format(&tokens, params, options)
}

I forked their repo to see if I could fix it, but think fixing it actually relates to the recognition of those substitution parameters and non-substitution parameters like :names. Basically shouldn't it be an error if different variable types are mixed in the same query in both your library and theirs?

I think it may be possible to output a SQL template to be formatted after substitution, rather than assuming the input SQL is already final. To try it out, insert this test here:

    #[test]
    fn it_recognizes_question_numbered_placeholders_with_param_values_demo() {
        let input = "SELECT * FROM things WHERE id = $1 LIMIT $2;";
        let params = vec![":id".to_string(), ":count".to_string()];
        let options = FormatOptions::default();
        let expected = indoc!(
            "
            SELECT
              *
            FROM
              things
            WHERE
              id = :id
            LIMIT
              :count;"
        );

        assert_eq!(
            format(input, &QueryParams::Indexed(params), options),
            expected
        );
    }

    #[test]
    fn it_recognizes_question_numbered_placeholders_with_param_values_demo_2() {
        let input = "SELECT * FROM things WHERE id = $1 LIMIT $2;";
        let params = vec!["$id".to_string(), "$count".to_string()];
        let options = FormatOptions::default();
        let expected = indoc!(
            "
            SELECT
              *
            FROM
              things
            WHERE
              id = $id
            LIMIT
              $count;"
        );

        assert_eq!(
            format(input, &QueryParams::Indexed(params), options),
            expected
        );
    }

Perhaps this could even open up the opportunity for the selected SQL dialect to map to an appropriate placeholder type if things like alphunum aren't supported... and add code comments documenting what the numbered parameters map to name-wise.

@aljazerzen, I don't know if you would consider this hacky, but I think it may be possible to take advantage of the library's substitution parameters:

This is great, I hadn't seen this.

So maybe this could be as simple as something which searches the pre-format string for \$\w+, replaces it with a $8001 (or whatever), and adds the variable to the query param map?

(ref #1284, which is a much harder problem, because we need to keep the s-string contents separately before replacing it in)

@aljazerzen, I don't know if you would consider this hacky, but I think it may be possible to take advantage of the library's substitution parameters:

This is great, I hadn't seen this.

So maybe this could be as simple as something which searches the pre-format string for \$\w+, replaces it with a $8001 (or whatever), and adds the variable to the query param map?

Yes, that is what I was guessing... \$\w+ token is / could be a logical variable in prql, the database/api-dependent variable type gets passed through to sqlformat-rs $n indexed substitution parameter so it can appear as expected physical rendering in the output sql text. I see the entrypoint here:

// formatting
let sql = if options.format {
let formatted = sqlformat::format(
&sql,
&sqlformat::QueryParams::default(),
sqlformat::FormatOptions::default(),
);
formatted + "\n"
} else {
sql
};

for sqlformat-rs with default() meaning no params used yet:

#[derive(Debug, Clone)]
pub enum QueryParams {
    Named(Vec<(String, String)>),
    Indexed(Vec<String>),
    None,
}

impl Default for QueryParams {
    fn default() -> Self {
        QueryParams::None
    }
}

Yes, that is what I was guessing... \$\w+ token is / could be a logical variable in prql, the database/api-dependent variable type gets passed through to sqlformat-rs $n indexed substitution parameter so it can appear as expected physical rendering in the output sql text.

Yes. I think we could start with a text replacement.

A fuller approach (one that could lead to #1284) would be to take it from the AST (if that's what you meant by "logical variable"...) at

let sql_ast = gen_query::translate_query(query, dialect)?;
. But that's harder, and just searching the text would show this approach working...

Ohh, this is interesting and could work!

Text replacement was the hacky thing that I wanted to avoid, because it would fail when formatting this:

SELECT '$1' as normal_string, $1 as paraml;

But if we use the params, they should be parsed correctly. If I understand correctly, this is what we could do:

  • find all params in RQ AST and replace them with positional params,
  • compile the AST to SQL,
  • format and pass the original params to be substituted back in.

The nice part is, that we can also extract s-string, guaranteeing that they are not formatted!

The nice part is, that we can also extract s-string, guaranteeing that they are not formatted!

Ah of course — I hadn't realize s-strings were still there in that function, but they are!

@aljazerzen do you think this is possible for @kriswuollett to work on / there's some initial work that they could do? Or too hard with the s-string issue? They scoped this out, and I know we're trying to encourage folks to do an initial PR.

Yes, the hard part will be extracting the params from RQ IR tree before that is compiled to SQL.

Something like this will be needed:

    /// Extracts params (and potentially s-strings)
    /// so they can be substituted back in after formatting.
    struct ParamExtractor {
        param_contents: Vec<String>,
    }

    impl ParamExtractor {
        /// Takes a param content - an arbitrary string that we want to prevent from being formatted.
        /// Returns a positional SQL param (i.e. $3), which will later be substituted for the content.
        fn push_param(&mut self, param_content: String) -> String {
            self.param_contents.push(param_content);

            self.param_contents.len().to_string()
        }
    }

    impl RqFold for ParamExtractor {
        fn fold_expr_kind() {
            ... here we can match params and call `push_param()` with the param name
        }
    }

This can then be called from here: https://github.com/PRQL/prql/blob/ed9977d530ad6b0b649f9ac4e53d4678ede2f0a4/crates/prql-compiler/src/sql/mod.rs#L24C1-L24C1

RqFold will provide ParamExtractor::fold_query() which must be applied to query before it is compiled.

@kriswuollett I'm not sure if you'd be up for starting this (no problem if not — the issue is appreciated regardless). If you would be, we'd be happy to help with any questions or guidance.

Hi @max-sixty, I'd love to but can't get to it at the moment as I'm just starting up on a new project and don't have the bandwidth quite yet.

Testing the OP example code on the playground:

prql target:sql.sqlite
from employees
filter id == $employee_id

results in

SELECT
  *
FROM
  employees
WHERE
  id = $employee_id

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

so it appears that this has been fixed

Ah nice, I guess sqlformat-rs fixed it and we inherited the fix!