westonplatter / phashion

Ruby wrapper around pHash, the perceptual hash library for detecting duplicate multimedia files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pHashion and text duplication?

phol56 opened this issue · comments

What would i need to do to make it work for comparing text or html files? - I can see from the pHash documentation that the capability exists, but it seems the gem is purely image focused. Unsurprisingly Phashion::Image.new("/some_page.html") fails!

Where would I need to start looking to monkey-patch this in?

Correct, this gem is focused image comparison. I looked at the pHash docs (http://www.phash.org/docs/design.html) and didn't see mentions of text comparison. Can you provide a link?

Rather than modify this library to gain the ability to do text comparison, I would suggest using a text comparison specific library. It's probably going to be more feature rich than phashion might ever become. Diffy looks like a great option.

@phol56 I'll close this issue since diffy looks like a better alternative. Please reopen the issue if I overlooked something.