There are 4 repositories under data-deduplication topic.
Self-contained C# library for data deduplication using Sqlite
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
General deduping engine for JDBC sources with output to JDBC/csv targets
A JAVA project that splits data using hashing techniques and removes duplicate blocks to save cloud storage. This project also uses the CloudSim framework for cloud storage simulation.
Practical backups. The Unix toolkit way.
This is a server client architecture based data deduplication implementation
A calculator for storage and transmission of deduplicated data presentation in charts and tables