westonplatter / phashion

Ruby wrapper around pHash, the perceptual hash library for detecting duplicate multimedia files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PNGs always give the same fingerprint, JPGs work fine

toadle opened this issue · comments

Hey there,

I just found this little gem and would really like to use it in own of my projects. Sadly I got the problem that every PNG gives me the same fingerprint hash. I have to convert my files to JPG in order to get a usable value.

A I doing something work or might this be a bug?

2.0.0-p247 :010 > img2 = Phashion::Image.new('icon.png').fingerprint
 => 54086765383280 
2.0.0-p247 :011 > img2 = Phashion::Image.new('icon-pro.png').fingerprint
 => 54086765383280 
2.0.0-p247 :012 > img2 = Phashion::Image.new('icon-pro.jpg').fingerprint
 => 874876298511921100 

I see that exact fingerprint too.

@fny: Today I noticed that my system is giving me strange behaviour on other PNGs I try to process to.
What kind of system are you on? What is is libpng-version?

Mine is libpng/1.5.14 installed through Homebrew on OSX 10.9

Identical. I'll give this a shot on an Ubuntu install.

I had the problem that RMagick wouldn't crop my PNGs right.
Perhaps you can let me know how this goes on Ubuntu. Otherwise I could also try this.

I got the same problem (OSX 10.9, ruby-2.0.0p247, libpng through macports) but only for png-files with alpha channels. So I guess this issue is related to #11

Did anybody try Linux so far?

@toadle I tried running tests I added (55f1a26) on linux (Ubuntu 12.04 with libpng12-dev 1.2.46-3ubuntu4).

fingerprints from Phashion::Image#fingerprint
-
image 1 fingerprint 1 = 3897734968558088698
image 2 fingerprint 2 = 54086765383280
image 3 fingerprint 3 = 54086765383280
image 3 fingerprint 4 = 3897734968558088698


fingerprints from Phashion::Image.get_fingerprint
-
image 1 fingerprint 1 = 3897734968558088698
image 2 fingerprint 2 = 54086765383280
image 3 fingerprint 3 = 54086765383280
image 1 fingerprint 4 = 3897734968558088698

I expect fingerprint 1 and 4 to be the same they are derived from the same png. But I expect images 2 and 3s' fingerpints to be different. They are the same.

@toadle I should add that I got the same result on mac too (libpng/1.5.14 via homebrew).

Not sure where to go next. I'll dig into the C in the couple days. All and any ideas are welcome!

MMh, so the fingerprint of the PNGs that don't give the right fingerprint seems always to be "54086765383280". See my example above.

I'd suspect that this is a certain PNG-type that causes this. Perhaps the root cause is already in the phash-library?
Does http://www.phash.org/demo/ give you other values for your PNG?

@westonplatter Did you have a chance to check it out?

@toadle i haven't dug into the C/C++ code.

Did you see anything on the pHash website (http://www.phash.org/) specifying limitations? I read through it, but did not see file type compatibilities. I noticed on their demo page, they only JPEG and BMP file types. Maybe this was intentional?

I also saw a comment on another Ruby gem project stating that pHash is not compatible with PNGs.
toy/pHash#3 (comment)

Regardless, I sent an email to the pHash support group (http://www.phash.org/support/). I will update this thread with the response.

I haven't read anything about a limitation, but I just verified the "Zero data"-problem with my local problematic files. In fact I can get PNGs to work, if there is no alpha channel.

See this:

pi = Phashion::Image.new("custom1.png")
=> #<Phashion::Image:0x007f8863c19b00 @filename="/tmp/custom1.png">
pi.fingerprint
=> 3309129861787351809

pi = Phashion::Image.new("standard.png")
=> #<Phashion::Image:0x007f8865cd4ca0 @filename="/tmp/standard.png">
pi.fingerprint
=> 8963947329965812755

pi = Phashion::Image.new("with-alpha1.png")
=> #<Phashion::Image:0x007f8860890f40 @filename="/tmp/with-alpha1.png">
pi.fingerprint
=> 54086765383280


pi = Phashion::Image.new("with-alpha2.png")
=> #<Phashion::Image:0x007f8863baa9f8 @filename="/tmp/with-alpha2.png">
pi.fingerprint
=> 54086765383280


im = Magick::Image.read("/tmp/custom1.png").first
=> /tmp/custom1.png PNG 640x1136 640x1136+0+0 DirectClass 8-bit 776kb
im.alpha?
=> false

im.alpha(Magick::ActivateAlphaChannel)
=> ActivateAlphaChannel=1
im.alpha?
=> true
im.write("/tmp/custom1-alpha.png")
=> /tmp/custom1.png=>/tmp/custom1-alpha.png PNG 640x1136 640x1136+0+0 DirectClass 8-bit 736kb
pi = Phashion::Image.new("/tmp/custom1-alpha.png")
=> #<Phashion::Image:0x007f8863cbeb78 @filename="/tmp/custom1-alpha.png">
pi.fingerprint
=> 54086765383280

So it seems this is really a problem deep down in the phash-library.
It works on PNGs without alpha-channel.

I can provide the test-data, if you want.

Have you guys considered this might be a libphash bug and seeing if there is a newer version?

On Nov 24, 2013, at 9:14, toadle notifications@github.com wrote:

I haven't read anything about a limitation, but I just verified the "Zero data"-problem with my local problematic files. In fact I can get PNGs to work, if there is no alpha channel.

See this:

pi = Phashion::Image.new("custom1.png")
=> #<Phashion::Image:0x007f8863c19b00 @filename="/tmp/custom1.png">
pi.fingerprint
=> 3309129861787351809

pi = Phashion::Image.new("standard.png")
=> #<Phashion::Image:0x007f8865cd4ca0 @filename="/tmp/standard.png">
pi.fingerprint
=> 8963947329965812755

pi = Phashion::Image.new("with-alpha1.png")
=> #<Phashion::Image:0x007f8860890f40 @filename="/tmp/with-alpha1.png">
pi.fingerprint
=> 54086765383280

pi = Phashion::Image.new("with-alpha2.png")
=> #<Phashion::Image:0x007f8863baa9f8 @filename="/tmp/with-alpha2.png">
pi.fingerprint
=> 54086765383280

im = Magick::Image.read("/tmp/custom1.png").first
=> /tmp/custom1.png PNG 640x1136 640x1136+0+0 DirectClass 8-bit 776kb
im.alpha?
=> false

im.alpha(Magick::ActivateAlphaChannel)
=> ActivateAlphaChannel=1
im.alpha?
=> true
im.write("/tmp/custom1-alpha.png")
=> /tmp/custom1.png=>/tmp/custom1-alpha.png PNG 640x1136 640x1136+0+0 DirectClass 8-bit 736kb
pi = Phashion::Image.new("/tmp/custom1-alpha.png")
=> #<Phashion::Image:0x007f8863cbeb78 @filename="/tmp/custom1-alpha.png">
pi.fingerprint
=> 54086765383280
So it seems this is really a problem deep down in the phash-library.
It works on PNGs without alpha-channel.

I can provide the test-data, if you want.


Reply to this email directly or view it on GitHub.

Well from all I can see it IS in fact a phash-bug. Current version of phash is still the same on their website.

Has someone opened an issue with the project?

On Nov 24, 2013, at 9:56, toadle notifications@github.com wrote:

Well from all I can see it IS in fact a phash-bug. Current version of phash is still the same on their website.


Reply to this email directly or view it on GitHub.

@mperham Yes. This guy: toy/pHash#3 (comment)
He also posted a patch how to fix this.

Too bad the maintainer is unresponsive. Fork and fix. Maybe you can get the maintainer to bless a new maintainer like I turned over maintenance of phashion.

On Nov 24, 2013, at 10:13, toadle notifications@github.com wrote:

@mperham Yes. This guy: toy/pHash#3 (comment)
He also posted a patch how to fix this.


Reply to this email directly or view it on GitHub.

@mperham I would try and patched this into phashion myself, but I've never built a gem so far. So I'm too rookie here.

@westonplatter Have you seen this? toy/pHash#3 (comment) Possible fix at hand.

Phashion vendors its own copy of libphash so you just need to update the code in the tarball. It's not ideal but possible.

On Nov 24, 2013, at 10:22, toadle notifications@github.com wrote:

@mperham I would try and patched this into phashion myself, but I've never built a gem so far. So I'm too rookie here.


Reply to this email directly or view it on GitHub.

@toadle and @mperham I'll pull the patch into the gem's pHash, recompile, and see if it works.

@toadle the patch worked on my machine. i'll be packaging my code into a PR.

@mperham i'll setup after TravisCI for this after the repo is transferred.

@toadle I talked with Evan, the pHash mantainer. He said he would pull in the patch (westonplatter/phash@ff255d2) and republish to the official pHash lib.

We will pull your patch and incorporate it for the next release.
Thank you for reporting and fixing this bug.

My thought was to provide a temporary solution via this branch, https://github.com/westonplatter/phashion/tree/temp-png-fix, and then republish the phashion gem once the offiicial pHash update comes. Does this solution meet your needs? Feel free to suggest something else.

@westonplatter Works for me already. I decided to always convert my stuff to JPG, therefore I can already work now. If I could decide for phashion, I did release the fix and release again, when the new phash-relase comes out, since we don't know when those guys will come around (when was their last release? 2010?).

But thats more work I think. How could I install the branched version of the gem?