securisec / chepy

Chepy is a python lib/cli equivalent of the awesome CyberChef tool.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Files loaded in "r" mode are auto decoded (potentially unexpected/unwanted)

geekscrapy opened this issue · comments

I have seen that if I load a text file, text decoding is happening automatically. I believe this is due to the lack of the binary mode ("rb") being used to load the file:

with open(path, "r") as f:

This causes issues when the exact bytes are required to be manipulated.

For example given the file below, it is loaded successfully by the first open statement (line 747), which automatically decodes it's contents. Obviously, this is ok in most cases, however, this changes the loaded contents, instead, it should have been loaded by line 750.

$ xxd file.bin
00000000: 6865 6c6c 6f24 653d 3133 0d0a            hello$e=13..

The above file has the md5 hash b2d3abb022e881225d9b1fc1b7cff2ae. However when loaded (as a file using .load_file(), and passed through .md5(), the hash comes out as 33f3ba396fa287739afefa64a715630d, which is incorrect.

This can be fixed by always loading files with open mode of "rb".

To recreate:

  1. Create a file with the MD5 hash of b2d3abb022e881225d9b1fc1b7cff2ae:
    $ echo "00000000: 6865 6c6c 6f24 653d 3133 0d0a hello$e=13.." | xxd -r > test.bin
  2. Load and hash the file:
    Chepy('test.bin').load_file().md5().o
  3. This results in the output of 33f3ba396fa287739afefa64a715630d (incorrect)
  4. Change file open mode to "rb" on line
    with open(path, "r") as f:
  5. Repeat steps 2 + 3. This should provide the correct output of b2d3abb022e881225d9b1fc1b7cff2ae

I propose that all files should be opened in "rb" mode (perferred option) or that an argument be provided by the user to load the file as bytes (a second option, if the first option breaks existing methods!)

As .load_file is, chepy is making assumptions to what the user wants (in this case decoding automatically). This causes issues when the file contents should be loaded as is (byte-for-byte).

I caught this as I use chepy to load a file and then send it to a local service which returns the md5 hash of the provided data.

This is as intended. Chepy will first try to load a file and if it fails to decode it, it will load is as binary. If a required format is needed for chepy, then the data can be loaded into a variable first, and then passed to chepy (no need to use .load_file then.

I get this is by design, but I believe this decision has been majorly overlooked. The idea of CyberChef was to not make assumptions to what the user wanted to do with the data. Chepy breaks this philosophy by loading and decoding data on behalf of the user. And by doing so, dramatically changing the data.

Currently, if a user loads a text file via the cli and wants to md5 it, will give them the wrong md5 hash! This is more than an edge case.

image

It is working as designed.

Thanks, can you run the following and provide the output?
# xxd 1.py

Also, if you use the file I provided above:

$ echo "00000000: 6865 6c6c 6f24 653d 3133 0d0a            hello$e=13.." | xxd -r > test.bin
(py3) $ chepy test.bin
>>> load_file
hello$e=13

>>> md5
33f3ba396fa287739afefa64a715630d
>>>
OKBye
(py3) $ md5sum test.bin
MD5 (test.bin) = b2d3abb022e881225d9b1fc1b7cff2ae

This shows two different md5 hashes for the same file due to not loading with mode "rb"