PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

Home Page:https://prql-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Compiler crash on `from_text` with zero rows

kgutwin opened this issue · comments

What happened?

It should be possible (not very useful, perhaps, but still possible) to define a relation using from_text where it has known columns but no rows. At a minimum, such a definition ought not to crash...

PRQL input

from_text "col_a,col_b"
# or, in JSON format
from_text format:json '{"columns": ["col_a","col_b"], "data": []}'

SQL output

n/a, crashes

Expected SQL output

-- could be something like this
WITH table_0 AS (
  SELECT
    NULL AS col_a,
    NULL AS col_b
  WHERE 1 = 0
)
SELECT
  col_a,
  col_b
FROM
  table_0

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

The message on crash is:

The application panicked (crashed).
Message:  removal index (is 0) should be < len (is 0)
Location: prqlc/prqlc/src/sql/gen_query.rs:446

Tested with latest code, commit d642a30

I'm not sure that the provided SQL output is the best choice moving forward, but I'm making a PR that at least provides an error message for this situation and prevents a crash

@kgutwin What is your use case for this?

  1. Is this as a kind of schema definition/declaration, or
  2. you are processing files which sometimes don't have any data?

For case 2. @KaeporaGaebora should help (and is greatly appreciated 🙏).

For case 1., this kind of thing should be handled by the type system once implemented.

Please note that the following works:

from_text "col_a,col_b\n,"

You can produce the SQL you provided with the following:

from [{col_a=null, col_b=null}]

HTH

Thanks - the use case might be a little uncommon, but our team is working on code that takes user input and generates PRQL documents. As part of that process, it's typical for users to generate empty relations with known columns. It's not very useful to them -- they will typically immediately begin adding rows -- but since our app generates and executes the PRQL "live", having a compiler error on an empty relation isn't something we want the users to need to deal with.

We can, of course, catch the compiler error and smoothly handle it, so the solution proposed by @KaeporaGaebora is indeed plausible; but I do wonder whether an error is the right solution, given that it is definitely not a SQL error to have an empty relation. As of now, we don't have a specific need to be able to create an empty relation, but I suppose at some point in the future it might come up.

I did want to note that the PRQL snippets you suggested produce subtly different output than what I had expected; in particular, the last one is missing the necessary WHERE 1 = 0 clause which prevents a row from being generated by the inner SELECT. For what it's worth, it appears that WHERE 1 = 0 is the most portable way of generating zero rows, see StackOverflow: https://stackoverflow.com/questions/18021915/is-there-a-sql-query-that-will-always-return-zero-results

I definitely think it should be possible to produce from_text with zero rows of data.

I'm not sure whether it should be possible to produce from_text with zero data, which implies zero columns. I don't see a compelling reason why not, but not confident / open to thoughts!

Thanks for adding the error @KaeporaGaebora , that's already better.