TobikoData / sqlmesh

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Home Page:https://sqlmesh.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid SQL generated for the ROW type in Trino unit test fixtures

erindru opened this issue · comments

Here is another Trino edge case which im not sure if it's SQLMesh or SQLGlot.

Given a model like so:

MODEL (
    name test.test_model,
    kind FULL
);

select
    max(meta.day) as day,
    meta.type as type
from test.metadata
group by 1, 2

And a test like so:

test_correct_grouping:
  model: test.test_model
  inputs:
    test.metadata:
      columns:
        meta: ROW(day date, type varchar)
      rows:
        - meta:
            day: 2024-03-30
            type: foo
        - meta:
            day: 2024-03-31
            type: bar
        - meta:
            day: 2024-04-01
            type: baz
  outputs:
    query:
      rows:
        - day: 2024-04-01
          type: baz

When running (against a default_test_connection pointed at Trino), it fails:

$ sqlmesh --debug test tests/test_model.yaml 
E
======================================================================
ERROR: test_sqlmesh (tests/test_model.yaml)
----------------------------------------------------------------------
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=MISMATCHED_COLUMN_ALIASES, message="line 1:147: Column alias list has 1 entries but 't' has 2 columns available", query_id=20240430_192625_00091_yysrm)

The reason for this failure is the SQL that SQLMesh generates to create the test fixture. From the logs, it tries to execute:

CREATE OR REPLACE VIEW datalake.sqlmesh_test_g087nhkb."datalake__test__metadata" AS
SELECT CAST(meta AS ROW(day DATE, type VARCHAR)) AS meta
FROM (
  VALUES
    (CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR))),
    (CAST(ROW(CAST('2024-03-31' AS DATE), 'bar') AS ROW(day DATE, type VARCHAR))),
    (CAST(ROW(CAST('2024-04-01' AS DATE), 'baz') AS ROW(day DATE, type VARCHAR)))
) AS t(meta)

The problem is that when CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR)) is put inside VALUES(), Trino unpacks the ROW into two columns. This can be shown like so:

Running SELECT CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR)) correctly produces a ROW:

_col0
{day=2024-04-30, type=foo}

However, wrapping it in VALUES by running SELECT * FROM ( VALUES ( CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR)) ) ) "helpfully" unpacks the ROW into multiple columns:

_col0 _col1
2024-04-30 foo

This is what causes the error: Column alias list has 1 entries but 't' has 2 columns available

The correct syntax re-assembles the ROW type from the top-level columns something like:

SELECT CAST((t.col1, t.col2) AS ROW(day DATE, type VARCHAR)) AS meta
FROM (
  VALUES
    (CAST(ROW(CAST('2024-03-30' AS DATE), 'foo') AS ROW(day DATE, type VARCHAR))),
    (CAST(ROW(CAST('2024-03-31' AS DATE), 'bar') AS ROW(day DATE, type VARCHAR))),
    (CAST(ROW(CAST('2024-04-01' AS DATE), 'baz') AS ROW(day DATE, type VARCHAR)))
) AS t(col1, col2)

Alternatively - instead of SQLMesh trying to generate a query from the yaml to produce a test fixture, maybe we could add a feature that allows the user to supply their own SELECT query to produce the data?

commented

Interesting! Yeah this looks like another SQLMesh edge case. Thanks for reporting Erin, I'll take a look soon.