Invalid SQL generated for the ROW type in Trino unit test fixtures
erindru opened this issue · comments
Here is another Trino edge case which im not sure if it's SQLMesh or SQLGlot.
Given a model like so:
MODEL (
name test.test_model,
kind FULL
);
select
max(meta.day) as day,
meta.type as type
from test.metadata
group by 1, 2
And a test like so:
test_correct_grouping:
model: test.test_model
inputs:
test.metadata:
columns:
meta: ROW(day date, type varchar)
rows:
- meta:
day: 2024-03-30
type: foo
- meta:
day: 2024-03-31
type: bar
- meta:
day: 2024-04-01
type: baz
outputs:
query:
rows:
- day: 2024-04-01
type: baz
When running (against a default_test_connection
pointed at Trino), it fails:
$ sqlmesh --debug test tests/test_model.yaml
E
======================================================================
ERROR: test_sqlmesh (tests/test_model.yaml)
----------------------------------------------------------------------
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=MISMATCHED_COLUMN_ALIASES, message="line 1:147: Column alias list has 1 entries but 't' has 2 columns available", query_id=20240430_192625_00091_yysrm)
The reason for this failure is the SQL that SQLMesh generates to create the test fixture. From the logs, it tries to execute:
CREATE OR REPLACE VIEW datalake.sqlmesh_test_g087nhkb."datalake__test__metadata" AS
SELECT CAST(meta AS ROW(day DATE, type VARCHAR)) AS meta
FROM (
VALUES
(CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR))),
(CAST(ROW(CAST('2024-03-31' AS DATE), 'bar') AS ROW(day DATE, type VARCHAR))),
(CAST(ROW(CAST('2024-04-01' AS DATE), 'baz') AS ROW(day DATE, type VARCHAR)))
) AS t(meta)
The problem is that when CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR))
is put inside VALUES()
, Trino unpacks the ROW
into two columns. This can be shown like so:
Running SELECT CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR))
correctly produces a ROW
:
_col0 |
---|
{day=2024-04-30, type=foo} |
However, wrapping it in VALUES
by running SELECT * FROM ( VALUES ( CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR)) ) )
"helpfully" unpacks the ROW
into multiple columns:
_col0 | _col1 |
---|---|
2024-04-30 | foo |
This is what causes the error: Column alias list has 1 entries but 't' has 2 columns available
The correct syntax re-assembles the ROW
type from the top-level columns something like:
SELECT CAST((t.col1, t.col2) AS ROW(day DATE, type VARCHAR)) AS meta
FROM (
VALUES
(CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR))),
(CAST(ROW(CAST('2024-03-31' AS DATE), 'bar') AS ROW(day DATE, type VARCHAR))),
(CAST(ROW(CAST('2024-04-01' AS DATE), 'baz') AS ROW(day DATE, type VARCHAR)))
) AS t(col1, col2)
Alternatively - instead of SQLMesh trying to generate a query from the yaml to produce a test fixture, maybe we could add a feature that allows the user to supply their own SELECT
query to produce the data?
Interesting! Yeah this looks like another SQLMesh edge case. Thanks for reporting Erin, I'll take a look soon.