There are 1 repository under dedup topic.
CLI utility to find near duplicate images and remove all but the best copy.
High performance rsync backup utilising BTRFS / ZFS filesystem features
distill large scale web page text
Golang structured logging (slog) deduplication and sorting for use with json logging
A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).
📄【优爱酷可视化网站网页数据采集系统】 采用先进的可视化采集技术,智能识别网页元素类型,如:图片、文字、链接、HTML 、文件等,支持运行Javascript脚本、应用正则表达式、自动滚屏、自动翻页、打开弹出窗口并采集数据,支持数据自动去重、仿人工间歇暂停防IP阻塞、自动保存等采集设置;支持浏览器Cookie和缓存等浏览器设置;支持代理轮换科学上网采集;支持“类别/关键字”;支持图像重命名等; 更可支持多线程采集等高级采集选项设置,vip版还可支持定时计划采集。
Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.
BenSP is a suite of parameterizable benchmarks for stream parallelism which is used to evaluate stream processing characteristics.
Remove local files that are duplicates of files in another path
Sift duplicate whitespaces away!
python script to analyze dedup usage in btrfs