Path.extract can't handle double backslashes

Question

Path.extract can't handle double backslashes

HiddeLekanne opened this issue a year ago · comments

Describe the bug
deepdiff.path.extract can't handle double backslashes "\\". It will still use the second backslash as unicode together with the character after that.

To Reproduce

from deepdiff import grep, extract

obj = ["something somewhere", {"abc\\bTHIS_b_CANT_BE_HERE": "somewhere", "string": 2, 0: 0, "somewhere": "around"}]
item = "somewhere"
ds = obj | grep(item)

for path in ds["matched_values"]:
    print(extract(obj, path))

This will result in an error:

Traceback (most recent call last):
  File "...\test.py", line 11, in <module>
    print(extract(obj, path))
  File "...\venv\lib\site-packages\deepdiff\path.py", line 169, in extract
    return _get_nested_obj(obj, elements)
  File "...\venv\lib\site-packages\deepdiff\path.py", line 108, in _get_nested_obj
    obj = obj[elem]
KeyError: 'abc\x08THIS_b_CANT_BE_HERE'
something somewhere

Expected behavior
I expect extract to be able to handle a "\\" in my keys of a dictionary.

OS, DeepDiff version and Python version (please complete the following information):

OS: [Windows]
Version [10]
Python Version [3.7]
DeepDiff Version [6.3]

Additional context
Fix could be:

for char in path:
    if prev_char == '\\':
        if char != '\\':  # Treat "\\" as a single escape character
            elem += '\\'
        elem += char

Instead of the current:

for char in path:
    if prev_char == '\\':
        elem += char

Sep Dehpour · Answer 1 · Fri Jul 07 2023 02:02:54 GMT+0800 (China Standard Time)

Hi @HiddeLekanne
Thanks for reporting the issue. I will keep this in mind for the next release. PRs are very welcome too!

HiddeLekanne · Answer 2 · Mon Jul 10 2023 16:30:01 GMT+0800 (China Standard Time)

I would love to do that sometime, but I am really unsure about the design requirements. For the purpose of this bug report I assumed that there is an encode and decode relationship between path and extract. To this extend I don't understand why the encode would even try to interpret backslashes (and other python string features) in the first place.

In short; Why are we not working with raw python strings? (r'string')

So a bugfix would be either, add a bunch of if statements to account for the python string features, in order to reverse them. Or start working in raw python strings. I wouldn't know what kind of solution you would prefer.

Sep Dehpour · Answer 3 · Sun Nov 19 2023 23:17:53 GMT+0800 (China Standard Time)

Hi @HiddeLekanne
This is fixed in the recent DeepDiff releases.