1132719438 / vhash

A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


The hash tool for duplicate video and image detection

Build Status Build Status License C++


Introduction

vhash is a C++ reimplementation of videohash for detecting near-duplicate videos. It takes any input video or image file and generate a 64-bit equivalent hash value.


Build vhash

Requirements

  • A C++ compiler supports C++14
  • CMake >= 3.11

Dependencies

External

  • opencv for image decoding & resizing
  • ffmpeg for video decoding & frame extracting
  • fftw for discrete cosine transform (DCT)
  • sqlite3 for file hash value caching
  • spdlog for logging

CentOS

sudo yum install opencv-devel ffmpeg-devel fftw-devel sqlite-devel spdlog-devel

Ubuntu

sudo apt install libopencv-dev libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev
sudo apt install libfftw3-dev libsqlite3-dev libspdlog-dev

macOS

brew install opencv@4 ffmpeg@5 fftw sqlite spdlog
brew link ffmpeg@5

Included

Compile

git clone https://github.com/1132719438/vhash.git
cd vhash
make
bin/vhash hash tests/testdata/lena.png

Development

Dependencies

CentOS

sudo yum install gtest-devel google-benchmark-devel

Ubuntu

sudo apt install libgtest-dev libbenchmark-dev

macOS

brew install googletest google-benchmark

Features

  • Generate hash value of single file or files in directory.
  • Store file's hash value in db cache to speed up hash generation.
  • Find duplicate video or image files in directory.

Usage

Hash

Generating hash for video or image files

Usage: vhash hash [OPTIONS] path  

Positionals:  
path TEXT:PATH(existing) REQUIRED file or directory path  

Options:  
-h,--help                   Print this help message and exit  
-e,--ext TEXT ...           file extension filter (i.e. -e mp4,mkv)  
-c,--cache TEXT             cache file or url  
-o,--output TEXT            output file  
-C,--use-cache              use cache  
-r,--recursive              recursively find files  
-P,--no-progress            not print progress bar  
bin/vhash hash -C -o hash.txt some_dir_path

Cache

Operating on hash cache

Usage: vhash cache [OPTIONS] [path]  

Positionals:  
path TEXT                     full file path  

Options:  
-h,--help                     Print this help message and exit  
-c,--cache TEXT               cache file or url  
-f,--find                     find cache item  
-d,--del                      delete cache item  
-C,--clear                    clear all hash cache  
-p,--pure                     pure expired hash cache  
-P,--pure-period INT [604800] pure period in seconds
bin/vhash cache -f some_file_path

Dup

Finding duplicate video or image files

Usage: vhash dup [OPTIONS] [path]  

Positionals:  
path TEXT:PATH(existing)    file or directory path  

Options:  
-h,--help                   Print this help message and exit  
-e,--ext TEXT ...           file extension filter (i.e. -e mp4,mkv)  
-c,--cache TEXT             cache file or url  
-o,--output TEXT            output file  
-C,--use-cache              use cache  
-r,--recursive              recursively find files  
-P,--no-progress            not print progress bar
bin/vhash dup -C -o dup.txt some_dir_path

Credits


License

License: MIT

Copyright (c) 2023 Leo. See LICENSE for details.

About

A C++ reimplementation of Near Duplicate Video Detection - Get a 64-bit comparable hash-value for any video (Video Hash).

License:MIT License


Languages

Language:C++ 93.0%Language:CMake 5.0%Language:Python 1.4%Language:Makefile 0.6%