dbt-labs / dbt-audit-helper

Useful macros when performing data audits

Home Page:https://hub.getdbt.com/dbt-labs/audit_helper/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

compare_queries can be wrong due to approximation by BigQuery

yu-iskw opened this issue · comments

Describe the bug

I used compare_queries and the generated view reported that there are some gaps between the target tables.
However, according to my research, many records existed in both of a_except_b and b_except_a.

My assumption is intersect distinct and except discint approximately deals with records.

Steps to reproduce

I just used macros like that. I masked the project ID and tables.

{% set old_table_query %}
  select *
  from `xxxxxxxx.old.users`
{% endset %}

{% set new_table_query %}
  select *
  from `xxxxxxxx.new.users`
{% endset %}

{{ audit_helper.compare_queries(
    a_query=old_table_query,
    b_query=new_table_query,
    primary_key="id"
) }}

Expected results

All records should be matched.

Actual results

There are some gaps.

Screenshots and log output

Screen Shot 2020-10-21 at 3 20 35 PM

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

$ dbt --version
installed version: 0.18.1
   latest version: 0.18.1

Up to date!

Plugins:
  - bigquery: 0.18.1
  - snowflake: 0.18.1
  - redshift: 0.18.1
  - postgres: 0.18.1

The operating system you're using:
Mac OS 10.15.5

The output of python --version:
Python 3.7.7

Are you interested in contributing the fix?

Yes, I have an idea to solve this.

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.