compare_column_values() does not work on dbt Cloud

Question

compare_column_values() does not work on dbt Cloud

davesgonechina opened this issue 3 years ago · comments

Describe the bug

The examples for compare_column_values (source) use print_table() but according to Fishtown Analytics support, print_table() does not work in dbt Cloud. There should at minimum be a disclaimer in the README, especially for the Advanced Usage, as the SQL is generated but not the table, forcing the user to copy paste and run each compiled SQL manually.

Steps to reproduce

Follow the instructions for usage or advanced usage of audit_helper.compare_column_values() in dbt Cloud and execute with dbt run-operation

In my case, this:

{% macro audit_my_table() %}
{%- set columns_to_compare=adapter.get_columns_in_relation(ref('my_new_table'))  -%}

{% set old_etl_relation_query %}
    select * from my_old_table
{% endset %}

{% set new_etl_relation_query %}
    select * from {{ ref('my_new_table') }}
{% endset %}

{% if execute %}
    {% for column in columns_to_compare %}
        {{ log('Comparing column "' ~ column.name ~'"', info=True) }}
        {% set audit_query = audit_helper.compare_column_values(
                a_query=old_etl_relation_query,
                b_query=new_etl_relation_query,
                primary_key="my_pk",
                column_to_compare=column.name
        ) %}

        {% set audit_results = run_query(audit_query) %}
        {% do audit_results.print_table() %}
        {{ log(audit_query, info=True) }}
    {% endfor %}
{% endif %}

{% endmacro %}

When I dbt run-operation this macro, nothing happens at all in the logs, ~~and there's some circumstantial evidence that it messes up my dbt Cloud Develop pod leading to basic queries of my_new_table and possibly others failing.~~ (the latter was due to non-breaking spaces)

If I remove the print_table() line, the above compiles and prints the first query in the logs successfully, it successfully compiles all the queries, and if I copy-paste-run any of those from the logs into a statement tab and run it, they work fine and I see the expected table in the results tab.

Expected results

A markdown formatted table based on a successfully compiled SQL query in the logs similar to the ones in the README.

Actual results

The macro will run "successfully" and compares every column but does not print any of the markdown tables in the logs.

System information

packages:

package: fishtown-analytics/dbt_utils
version: 0.6.4
package: fishtown-analytics/codegen
version: 0.3.1
package: fishtown-analytics/audit_helper
version: 0.3.0
package: gitlabhq/snowflake_spend
version: 1.2.0

dbt_date includes dbt_utils

package: calogica/dbt_date
version: [">=0.3.0"]

Which database are you using dbt with?

The output of dbt --version:

0.19.1

The operating system you're using:
N/A

The output of python --version:
N/A

Additional context

There is a support ticket

Are you interested in contributing the fix?

Yes.

WW Henderson · Answer 1 · Fri Jul 02 2021 20:34:57 GMT+0800 (China Standard Time)

I've had some success substituting {% do log(audit_results.rows.values(), info=True) %} to print row tuples to dbt Cloud's stderr logs, perhaps the README could warn dbt Cloud users of the limitation around Agate print_table() outputting to stdout, which is not visible in the dbt Cloud UI, and suggest using this workaround.

Fernando Brito · Answer 2 · Wed Jul 28 2021 22:26:31 GMT+0800 (China Standard Time)

I would also suggest updating the docs to include the max_column_width parameter in the advanced usage, like: {% do audit_results.print_table(max_column_width=30) %}.

So instead of:

We get: