dbt-labs / dbt-audit-helper

Useful macros when performing data audits

Home Page:https://hub.getdbt.com/dbt-labs/audit_helper/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

compare_column_values() does not work on dbt Cloud

davesgonechina opened this issue · comments

Describe the bug

The examples for compare_column_values (source) use print_table() but according to Fishtown Analytics support, print_table() does not work in dbt Cloud. There should at minimum be a disclaimer in the README, especially for the Advanced Usage, as the SQL is generated but not the table, forcing the user to copy paste and run each compiled SQL manually.

Steps to reproduce

Follow the instructions for usage or advanced usage of audit_helper.compare_column_values() in dbt Cloud and execute with dbt run-operation

In my case, this:

{% macro audit_my_table() %}
{%- set columns_to_compare=adapter.get_columns_in_relation(ref('my_new_table'))  -%}

{% set old_etl_relation_query %}
    select * from my_old_table
{% endset %}

{% set new_etl_relation_query %}
    select * from {{ ref('my_new_table') }}
{% endset %}

{% if execute %}
    {% for column in columns_to_compare %}
        {{ log('Comparing column "' ~ column.name ~'"', info=True) }}
        {% set audit_query = audit_helper.compare_column_values(
                a_query=old_etl_relation_query,
                b_query=new_etl_relation_query,
                primary_key="my_pk",
                column_to_compare=column.name
        ) %}

        {% set audit_results = run_query(audit_query) %}
        {% do audit_results.print_table() %}
        {{ log(audit_query, info=True) }}
    {% endfor %}
{% endif %}

{% endmacro %}

When I dbt run-operation this macro, nothing happens at all in the logs, and there's some circumstantial evidence that it messes up my dbt Cloud Develop pod leading to basic queries of my_new_table and possibly others failing. (the latter was due to non-breaking spaces)

If I remove the print_table() line, the above compiles and prints the first query in the logs successfully, it successfully compiles all the queries, and if I copy-paste-run any of those from the logs into a statement tab and run it, they work fine and I see the expected table in the results tab.

Expected results

A markdown formatted table based on a successfully compiled SQL query in the logs similar to the ones in the README.

Actual results

The macro will run "successfully" and compares every column but does not print any of the markdown tables in the logs.

System information

packages:

  • package: fishtown-analytics/dbt_utils
    version: 0.6.4
  • package: fishtown-analytics/codegen
    version: 0.3.1
  • package: fishtown-analytics/audit_helper
    version: 0.3.0
  • package: gitlabhq/snowflake_spend
    version: 1.2.0

dbt_date includes dbt_utils

  • package: calogica/dbt_date
    version: [">=0.3.0"]

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

0.19.1

The operating system you're using:
N/A

The output of python --version:
N/A

Additional context

There is a support ticket

Are you interested in contributing the fix?

Yes.

I've had some success substituting {% do log(audit_results.rows.values(), info=True) %} to print row tuples to dbt Cloud's stderr logs, perhaps the README could warn dbt Cloud users of the limitation around Agate print_table() outputting to stdout, which is not visible in the dbt Cloud UI, and suggest using this workaround.

I would also suggest updating the docs to include the max_column_width parameter in the advanced usage, like: {% do audit_results.print_table(max_column_width=30) %}.

So instead of:
image

We get:
image