data-diff fails to identify unique composite primary keys
gabrielleberanger opened this issue · comments
Describe the bug
I am running data-diff on a model having a composite primary key, configured as such:
version: 2
models:
- name: bse_metabase__permissions_group_membership
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: ['user_id', 'group_id']
But the data-diff execution fails on the _test_duplicate_keys
method (while there are actually no duplicates).
Suggested resolution
data-diff checks for duplicates using CONCAT(<field_1>, <field_2>)
, but CONCAT(<field_1>, '-', <field_2>)
would be more appropriate.
Let's assume that we have the following rows:
- Row 1:
user_id = 1
xgroup_id = 11
- Row 2:
user_id = 11
xgroup_id = 1
In the above example:
CONCAT(<field_1>, <field_2>)
returns111
for both rows (hence the error raised)CONCAT(<field_1>, '-', <field_2>)
returns1-11
and11-1
, which is more robust
@gabrielleberanger Do you still want to work on this pull request? I remember us talking about this in the dbt community slack!
@sungchun12 yes, I do!
I'm starting to work on it right now, so you should receive a PR from me by the end of this week.
Lovely, look forward to reviewing it when it's ready!