thorn-oss / perception

Perceptual hashing tools for detecting child sexual abuse material

Home Page:https://perception.thorn.engineering/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image reading function does not support non-latin paths

fireattack opened this issue · comments

Most, if not all of functions do not work with non-Latin (Unicode) paths/filenames, on Windows at least.

The reason is that cv2.imread has very poor unicode support of this (opencv/opencv#4292) and they don't have any plan to fix it.

I have to patch it temporarily at

elif isinstance(filepath_or_buffer, str):
if validators.url(filepath_or_buffer):
return read(request.urlopen(filepath_or_buffer, timeout=timeout))
if not os.path.isfile(filepath_or_buffer):
raise FileNotFoundError('Could not find image at path: ' +
filepath_or_buffer)
image = cv2.imread(filepath_or_buffer)

with something ugly like this

        with PIL.Image.open(filepath_or_buffer) as im:
            _ = im.convert("RGB")
        return np.array(_)
        # image = cv2.imread(filepath_or_buffer)

# Adopted from above `if PIL is not None and isinstance(filepath_or_buffer, PIL.Image.Image):` case

Because Pillow has much better support with non-Latin paths.

Things like

        image = np.asarray(
            bytearray(open(filepath_or_buffer, "rb").read()), dtype=np.uint8)
        image = cv2.imdecode(image, cv2.IMREAD_UNCHANGED)
# Again, adopted from above `if isinstance(filepath_or_buffer, (io.BytesIO, client.HTTPResponse)):` case

or simply

        image = cv2.imdecode(np.fromfile(filepath_or_buffer, dtype=np.uint8), cv2.IMREAD_UNCHANGED)

could also work.

Wish we can have a proper fix for this.