thorn-oss / perception

Perceptual hashing tools for detecting child sexual abuse material

Home Page:https://perception.thorn.engineering/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This project is fundamentally flawed

stevefan1999-personal opened this issue · comments

Some teenagers can already have questionable appearances, especially for adolescent. So, how do you distinguish teens and adults? What about women with small breasts? And barely legals?

Also, East Asia females tends to not have aged well like westerners, some of them might look like 14 despite being 20'ish or even 30'ish, so is this a micro-racism against asian as well?

Man, this project is so australian.

Thank you for the questions @stevefan1999-personal , but to clarify this repo is not an image classifier. It is a perceptual hashing library / utility.

https://en.wikipedia.org/wiki/Perceptual_hashing

It helps people compare two images and determine if they are the same.

@davidrs I know what phash is and I'm questioning the existence of thorn.

This tool will be invalid as well as if the premise is wrong, which is "what is supposed to be the so-called CSAM". I had elaborated my point already.

So are you not going to identify what is supposed to be CSAM, and then phash it using this tool?

Besides, how do you check for steganography — Some innocent-looking photos/images, but includes hidden messages inside, probably another bit stream? Steganography is famous in CTF and you know 🤷‍♂️

Thanks @stevefan1999-personal but closing the issue as this does not relate to this code library.

@davidrs I think you are sweeping it a little quickly under the rug. Can you justify the designed hash properties that makes it a good use case for the claimed usage ?

How do you defend against semi transparent overlaying ?
By superposing very transparent offensive material to a picture you want to censor, with various levels of alpha, you can create collision in your perceptual hash.
You can create a continuous chain of similar images from good image to offensive image.

Somewhere some poor guy labelling the data will flag the very transparent overlay as offensive, which will make a collision of the perceptual hash, and your whole chain of similar images will collapse and be flagged as offensive and the original image will be flagged as offensive.

It can be weaponized to target any individual or websites. You take some of their published image, overlay transparent image with various levels of alpha, and you flood-republish. You do that a bunch of times with a script. Your content will get flagged. The hash will be added and the original content will get flagged as offensive and the user/website tanked down.

@stevefan1999-personal We assume that this tool is meant to deal with "clear" or "apparant" images, if you have steganography within images you should use https://github.com/DominicBreuker/stego-toolkit to extract the image then use thorn/perception to find a match in the child abuse database that you are using to identify problematic files.
This is not a classifier to recognize pictures that are not confirmed to be an image of child abuse, only a matcher to see if the image is a well-known problem file.The same applies to copyright detection, without an established dataset we don't know if the material is copyrighted.
@unrealwill the thing is you can extract semi transparent overlays into a seperate file for detection, thus cracking their system of data hiding. This is a good example https://github.com/BlueCocoa/hoshizora https://github.com/Coxxs/image-hide https://github.com/TomWildenhain/MagicPNG

@DonaldTsang The projects you suggested are interesting, but they are another idea. You shouldn't use the alpha channel to do what I suggested, you hard mix the 2 images into a single RGB image. Separating the image is then much harder. The goal isn't to hide but to show enough for the human to be able to flag it bad, but have the perceptual hash collide.

@unrealwill really those other project can "supplement" this project, you can attach a pre-processor (not part of this project, should be in a separate repo) to avoid obfuscation techniques, extract image layers and analyse the parts individually in perception.