Tom64b / dHash

dHash = Difference Hash = a quick algorithm to compare images visually

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mysql Bit Count

willbrow opened this issue · comments

Hi

Firstly thanks for your work as this will help my project alot.

I am coing to be stroring hashes in a mysql database generated with your php code and there will be around 100,000.

I then want to be able to query the above database to check if a duplicate or simular image is in the above database and return matching rows with a hamnering distance of upto 10. Mysql Bit_count i thought would be the best way to so this but thought i would ask for your thoughts and advise as i thought bit_count was integra only and thought there may be a bteer way to do this.

Hi,
I am glad you like it.
I never thought how to store it in MySQL. For my needs I just calculate the hashes and use (almost) right away.
Here's an answer on StackOverflow which I think seems reasonable:
https://stackoverflow.com/questions/21037578/

However if you want to create something really fast you can improve the idea below:
Since a hash is 64bit you could try to treat it as 8 bytes and calculate number of ones in each byte and store these values in MySQL and also store number of ones in the whole 64bit hash.
Now if you want to find in your database hashes with distance of 10 or less you need to calculate number of ones in the whole hash. Let's say it's 24. So now you need to find all hashes in the DB which have 14-34 ones. And then you need to compare the 8 bytes in similar fashion.

I will give sone testing and report back with the best working method and code.

I will be comparing a single image for matches against 100,000 images to find simular.To reduce false positives i will set a low hammering distance but can i ask have you tried doing tge 512bit hash and if you did were results better ?

i tried changing 9x8 to 17x17 to produce the 512bit hash but only tried quickly so will spend some time on the loop to see if i can get this to work