seperman / deepdiff

DeepDiff: Deep Difference and search of any Python object/data. DeepHash: Hash of any object based on its contents. Delta: Use deltas to reconstruct objects by adding deltas together.

Home Page:http://zepworks.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unexpected Repetition of Elements when Generating Delta Between Dictionaries

kfirc opened this issue · comments

Please checkout the F.A.Q page before creating a bug ticket to make sure it is not already addressed.

Describe the bug
When using the deepdiff library to discern the differences between two dictionaries and generate a delta, an unexpected repetition of elements occurs.

To Reproduce

from deepdiff import DeepDiff, Delta

d1 = {'a': [{'id': 1}, {'id': 2}, {'id': 3}]}
d2 = {'a': [{'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]}

deep_diff_result = DeepDiff(d1, d2, exclude_regex_paths=[r"(?=root.*\['id'\])"], ignore_order=True, report_repetition=True)

result = d2 + Delta(deep_diff_result)
print(result)
  1. Take two dictionaries:
d1 = {'a': [{'id': 1}, {'id': 2}, {'id': 3}]}
d2 = {'a': [{'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]}
  1. Use the following code to compare them:
from deepdiff import DeepDiff, Delta
deep_diff_result = DeepDiff(d1, d2, exclude_regex_paths=[r"(?=root.*\['id'\])"], ignore_order=True, report_repetition=True)
  1. Check the output:
{'repetition_change': {"root['a'][0]": {'old_repeat': 3, 'new_repeat': 4, 'old_indexes': [0, 1, 2], 'new_indexes': [0, 1, 2, 3], 'value': {'id': 1}}}}
  1. Apply the delta to d2:
result = d2 + Delta(deep_diff_result)
print(result)

Expected behavior
I anticipated the only difference between d1 and d2 to be the {'id': 4} entry.

OS, DeepDiff version and Python version (please complete the following information):

  • OS: macOS
  • Version Ventura 13.4.1
  • Python Version 3.9.0
  • DeepDiff Version 6.3.0

Additional context
The result produced was {'a': [{'id': 1}, {'id': 1}, {'id': 1}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]}, which had four repetitions of {'id': 1}.

I've not found any similar issue on Stack Overflow, and I've reviewed open and closed issues on the deepdiff GitHub repository without identifying any similar scenarios.

Related Research: I've looked through Delta Documentation, but it didn't provide clarity for this particular case.

Thanks in advance

Hi @kfirc
Thanks for reporting this.
What is happening here is that exclude_regex_paths is not working properly with report_repetition:

In [1]: from deepdiff import DeepDiff, Delta
   ...:
   ...: d1 = {'a': [{'id': 1}, {'id': 2}, {'id': 3}]}
   ...: d2 = {'a': [{'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]}
   ...:
   ...: deep_diff_result = DeepDiff(d1, d2, exclude_regex_paths=[r"(?=root.*\['id'\])"], ignore_order=True, report_re
   ...: petition=True)
   ...:

In [2]: deep_diff_result
Out[2]:
{'repetition_change': {"root['a'][0]": {'old_repeat': 3,
   'new_repeat': 4,
   'old_indexes': [0, 1, 2],
   'new_indexes': [0, 1, 2, 3],
   'value': {'id': 1}}}}

In [3]: deep_diff_result = DeepDiff(d1, d2, ignore_order=True, report_repetition=True)

In [4]: deep_diff_result
Out[4]: {'iterable_item_added': {"root['a'][3]": {'id': 4}}

What delta object gets in your case is that {'id': 1} needs to be repeated 4 times. That's why you get the unexpected result.