There are 1 repository under de-duplication topic.
All-in-one text de-duplication
CD4Py: Code De-Duplication for Python
Container image converter aiming to minimize image size and speed up boot time dramatically with block-level de-dupliction and lazy-pull technology.
Preprocesses the query logs which can be used by suggesters like Most Popular Suggester (MPS).
A collection of algorithms to generate a signature/fingerprint/hash in order to be used for detecting duplicate/near duplicate documents.
Efficient Event Streamlining and Dynamic De-Duplication Across Message Brokers - A Technology-Agnostic approach
A misc project of python based function to track, survey and manage files from mutiple systems
A secure image storage application written in Python