thorn-oss / perception

Perceptual hashing tools for detecting child sexual abuse material

Home Page:https://perception.thorn.engineering/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with pHash implementation

shubhamjain0594 opened this issue · comments

image = cv2.resize(

According to thesis by Zauner (2010) there has to be 7x7 mean filter before resizing. And I think it is not being done in the current implementation.

Thank you so much for raising this! Will try to get this corrected this soon but, of course, would gladly accept a pull request to add a call to cv2.boxFilter before the resize step. Here are some implementation questions that came to mind when thinking about it and my suggested answers.

  • How should we adjust the size of the kernel depending on value of hash_size and highfreq_factor? I imagine we could do some kind of linear adjustment such that when hash_size=8 and highfreq_factor=4 the kernel size should be 7 (as in the paper).
  • How do we preserve existing behavior while allowing the "correct" behavior to occur? One way could be to add a keyword argument called boxFilter that defaults to False (existing behavior). Then we could add to the docstring something to the effect of, To use this algorithm as described by Zauner (2010), set highfreq_factor=4, hash_size=8, and boxFilter=True.

Let me know any additional feedback / questions you have, of course. Again, thanks for opening this issue!