PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

Home Page:https://prql-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

incorrect range in window function

korbash opened this issue · comments

What happened?

I see two problems. Here, I will present the first one.
In windows, a range of a single number is not processed correctly.
(output from playground)

PRQL input

from login_event
group track_id (
    window rows:-1..-1 (
        sort time_upload
        derive {
            last_user = min user_id
            }
    )
)

SQL output

SELECT
  *,
  MIN(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_user
FROM
  login_event

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

Expected SQL output

SELECT
  *,
  MIN(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
  ) AS last_user
FROM
  login_event

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

Of course, this problem is somewhat artificial since I can use the lag function.

The second problem is again with the range, but now when using the window function first
(output from playground)

PRQL input

from login_event
group track_id (
    window rows:-5..0 (
        sort time_upload
        derive {
            last_user = first user_id
            }
    )
)

SQL output

SELECT
  *,
  FIRST_VALUE(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload
  ) AS last_user
FROM
  login_event

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

Expected SQL output

SELECT
  *,
  FIRST_VALUE(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
  ) AS last_user
FROM
  login_event

Thanks for the issues @korbash

For the first, here's a smaller example:

from login_event
window rows:1..1 (
  sort time_upload
  derive {
      last_user = min user_id
  }
)
SELECT
  *,
  MIN(user_id) OVER (
    ORDER BY
      time_upload ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_user
FROM
  login_event
ORDER BY
  time_upload

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

If the range expression changes to 1..2, this works correctly, so something is breaking when the values are the same. I had a look for a few minutes but couldn't find where exactly. It's before generating the RQ, since the RQ contains:

          window:
            frame:
              kind: Rows
              range:
                start: null
                end: null
            partition: []