using `select` after `group col (take n)` causes a crash/panic for certain targets (have `DISTINCT ON`)
exit91 opened this issue · comments
What happened?
The compiler panics when compiling the PRQL input given below.
I am using prqlc 0.9.2
on Manjaro Linux x86_64
.
PRQL input
prql target:sql.postgres
# prql target:sql.clickhouse
# prql target:sql.duckdb
from a
join b (b.a_id == a.id)
group {a.id} (
sort b.x
take 1
)
select {a.id, b.y}
SQL output
--
Expected SQL output
No response
MVCE confirmation
- Minimal example
- New issue
Anything else?
The compiler does not crash when
- adding the column
b.x
to the final select statement, OR - changing the target language to anything other than
sql.postgres
,sql.clickhouse
,sql.duckdb
Playground-friendly version of the offending input:
prql target:sql.postgres
# prql target:sql.clickhouse
# prql target:sql.duckdb
from customers
join invoices (invoices.customer_id == customers.customer_id)
group {customers.customer_id} (
sort {- invoices.total}
take 1
)
sort { customers.customer_id }
select { customers.customer_id, invoices.billing_country }
stderr output with RUST_LOG=trace
: https://gist.github.com/exit91/410a49f7d7b8d7ccf14b3dbf781e7bec
Thanks for the excellent bug report @exit91
Adding select
after the #2711 example does not seem to compile correctly.
Like:
prql target:sql.duckdb
from employees
group city (
take 1
)
select last_name
Expected:
WITH table_0 AS (
SELECT
DISTINCT ON (city) *
FROM
employees
ORDER BY
city
)
SELECT
last_name
FROM
table_0
FYI slightly smaller example for the panic:
prql target:sql.postgres
from a
group {a.id} (
sort a.x
take 1
)
select {a.id}
This cropped up again in #3460
@aljazerzen would you have any direction on this?
FWIW, this is blocking our upgrade from 0.8.1, meaning we've got an almost 4.5 month-old version running in prod
I noticed that using derive
instead of select
doesn't cause a panic. (aggregate
cause a panic)
But I'm not sure if this is ideal behavior.
What is the difference between using and not using the WITH clause? (This may be off topic)
Example 1
prql target:sql.duckdb
from tab1
group col1 (
take 1
)
derive foo
SELECT
DISTINCT ON (col1) *,
foo
FROM
tab1
-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)
Example 2
prql target:sql.duckdb
from tab1
group col1 (
take 1
)
derive foo = 1
WITH table_0 AS (
SELECT
DISTINCT ON (col1) *
FROM
tab1
)
SELECT
*,
1 AS foo
FROM
table_0
-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)
Example 3
prql target:sql.duckdb
from tab1
group col1 (
take 1
)
derive foo = 1
select foo
-> panic
Hey – just so we could plan accordingly, is there a plan to prioritise this issue, and/or is there any way we could help without deep compiler knowledge?
Sorry for the delay — this really has been outstanding for a while now.
I know we've seen a slowdown in deeper compiler work recently as folks have been prepping for talks (+ I have been v busy at work).
I'll commit to at least look into it this weekend. Thank you for the message @mklopets
FYI I had a look at this, and chatted to @aljazerzen for a moment about it. It's not as simple as I'd hoped. I'll have a further look this week.
Otherwise @aljazerzen we can look in person next weekend!
How about allowing a prql-compiler feature to disable this conversion as a temporary workaround?
How about allowing a prql-compiler feature to disable this conversion as a temporary workaround?
I think we'd take a PR as a workaround.
I do think we should try and fix. Sorry for the delay. I chatted in person in LA with @aljazerzen about it, but the conversation went higher-level, as it often does. I will really try to block out a few hours and get something working...
From #3526:
An immediate workaround would be to stop using
DISTINCT ON
in the default target and enable this conversion only in things likeexperimental
feature?
Interesting — we could restrict DISTINCT ON
to only duckdb
, or possibly just revert DISTINCT ON
entirely until we can fix the issue.
Would reverting DISTINCT ON
satisfy @mklopets for the moment? Or it's important for performance?
(This relies on the compiler working without DISTINCT ON
, I haven't confirmed this)
Answering on behalf of @mklopets (we're working together):
Yes, that would work for us 👍