seperman / deepdiff

DeepDiff: Deep Difference and search of any Python object/data. DeepHash: Hash of any object based on its contents. Delta: Use deltas to reconstruct objects by adding deltas together.

Home Page:http://zepworks.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Path.extract can't handle double backslashes

HiddeLekanne opened this issue · comments

Describe the bug
deepdiff.path.extract can't handle double backslashes "\\". It will still use the second backslash as unicode together with the character after that.

To Reproduce

from deepdiff import grep, extract

obj = ["something somewhere", {"abc\\bTHIS_b_CANT_BE_HERE": "somewhere", "string": 2, 0: 0, "somewhere": "around"}]
item = "somewhere"
ds = obj | grep(item)

for path in ds["matched_values"]:
    print(extract(obj, path))

This will result in an error:

Traceback (most recent call last):
  File "...\test.py", line 11, in <module>
    print(extract(obj, path))
  File "...\venv\lib\site-packages\deepdiff\path.py", line 169, in extract
    return _get_nested_obj(obj, elements)
  File "...\venv\lib\site-packages\deepdiff\path.py", line 108, in _get_nested_obj
    obj = obj[elem]
KeyError: 'abc\x08THIS_b_CANT_BE_HERE'
something somewhere

Expected behavior
I expect extract to be able to handle a "\\" in my keys of a dictionary.

OS, DeepDiff version and Python version (please complete the following information):

  • OS: [Windows]
  • Version [10]
  • Python Version [3.7]
  • DeepDiff Version [6.3]

Additional context
Fix could be:

for char in path:
    if prev_char == '\\':
        if char != '\\':  # Treat "\\" as a single escape character
            elem += '\\'
        elem += char

Instead of the current:

for char in path:
    if prev_char == '\\':
        elem += char

Hi @HiddeLekanne
Thanks for reporting the issue. I will keep this in mind for the next release. PRs are very welcome too!

I would love to do that sometime, but I am really unsure about the design requirements. For the purpose of this bug report I assumed that there is an encode and decode relationship between path and extract. To this extend I don't understand why the encode would even try to interpret backslashes (and other python string features) in the first place.

In short; Why are we not working with raw python strings? (r'string')

So a bugfix would be either, add a bunch of if statements to account for the python string features, in order to reverse them. Or start working in raw python strings. I wouldn't know what kind of solution you would prefer.

Hi @HiddeLekanne
This is fixed in the recent DeepDiff releases.