An easy-to-use tool to extract frames from video and store into database. Basically, this is a python wrapper of ffmpeg which additionally stores the frames into database. Additionally, some basic capabilities for detecting duplicates and deduplication has been added such that you can quite easily extract 100 different frames from a video, or if you scrape a large number of images from the net you can resize and dedupe before placing them into an image database.
This is a fork and combination of a number of projects:
- https://github.com/forwchen/vid2frame
- https://github.com/jinyu121/video2frame/blob/master/video2frame.py
- https://github.com/rmccorm4/PyTorch-LMDB/blob/master/folder2lmdb.py
-
Extracting frames from large video datasets (usually 10k ~ 100k, hundreds of GBs on disk) is tedious, automate it.
-
Storing millions of frames on disk makes subsequent processing SLOW.
-
Common mistakes I once made:
- Decode all frames (using scikit-video) and store them into a LARGE .npy file, nice way to blow up the disk.
- Extract all frames using ffmpeg and write to disk. Takes foreeeeever to move or delete.
- Extract JPEG frames using ffmpeg but ignores the JPEG quality. For deep learning and computer vision, a good quality of images (JPEG quality around 95) is required.
-
Good practice in my opinion:
- Add
-qscale:v 2
to ffmpeg command. - Store extracted frames into a database, LMDB or HDF5.
- (Optional) Use Tensorpack dataflow to accelerate reading from the database.
- Suggestions are welcome.
- Add
Conda (Linux):
conda env create -f environment.yml
Or using pip:
pip install -r requirements.txt
usage: vid2frame.py [-h] --db_name DB_NAME --db_type {LMDB,HDF5,FILE,PKL}
[--tmp_dir TMP_DIR] [-s SHORT] [-H HEIGHT] [-W WIDTH]
[-k SKIP] [-n NUM_FRAME] [-r INTERVAL] [-d NO_DUPLICATES]
[--hash_size HASH_SIZE]
[--hash_alg {average_hash,phash,dhash,whash}]
video_path
positional arguments:
video_path The video path (single file or dir)
optional arguments:
-h, --help show this help message and exit
--db_name DB_NAME The database to store extracted frames
--db_type {LMDB,HDF5,FILE,PKL}
Type of the database
--tmp_dir TMP_DIR Temporary folder
-s SHORT, --short SHORT
Keep the aspect ration and scale the shorter side to s
-H HEIGHT, --height HEIGHT
The resized height
-W WIDTH, --width WIDTH
The resized width
-k SKIP, --skip SKIP Only store frames with (ID-1) mod skip==0, frame ID
starts from 1
-n NUM_FRAME, --num_frame NUM_FRAME
Uniformly sample n frames, this will override --skip
-r INTERVAL, --interval INTERVAL
Extract one frame every r seconds
-d NO_DUPLICATES, --no_duplicates NO_DUPLICATES
Remove duplicates within threshold of another image
--hash_size HASH_SIZE
For duplicate detection the size the image will be
resized to for comparison
--hash_alg {average_hash,phash,dhash,whash}
- The frames will be stored as strings of their binary content, i.e. they are NOT decoded. Both LMDB and HDF5 are key-value storage, the keys are in the format of
video_name/frame_id
(assuming there are no two videos with the same name). - The frames are in JPEG format, with JPEG quality ~95. Note the
-qscale:v 2
option invid2frame.py
. This is important for subsequent deep learning tasks. - The database to use is either LMDB or HDF5, choose one according to:
- Reading from HDF5 is convenient, if you do not plan to use Tensorpack, which does not support HDF5 well currently, always choose HDF5.
- LMDB integrates better with Tensorpack, but reading from it is less flexible (though much much faster than HDF5).
- Resizing options (exclusive):
- Resize the shorter edge and keep aspect ratio (the longer edge adapts) (--short)
- Resize to specific height & width (--height --width)
- Sampling options (exclusive):
- Keep one of frame every
k
frames (default 1, i.e. keep every frame) (--skip) - Uniformly sample
n
frames (--num_frame). For example: If there are 10 frames, --skip=2 will sample frames 1,3,5,7,9 and --num_frame=4 will sample frames 1,4,7,10. - Sample one frame every
r
seconds (--interval) or 1/r FPS. For r==1, its 1 FPS, and r==2, its 0.5 FPS.
- Keep one of frame every
- Duplicate removal options:
- Requires testing the two parameters:
no_duplicates
andhash_size
based on video size and frame similarity - For 1920x1080 (HD) video with minimal duplicate removal try
no_duplicates=0.99
andhash_size=32
- For aggressive HD resolution removal try
no_duplicates=0.98
andhash_size=8
- Tip: try exporting to FILE type and then using ffmpeg to make a video (at 25 fps):
ffmpeg path/to/frames/%08d.jpg -r 25 test_frames.mp4
- Requires testing the two parameters:
- Video files are identified with extensions, currently recognizing
['.mp4', '.avi', '.flv', '.mkv', '.webm', '.mov']
. - Videos with the same name (without extension) are considered duplicates. Only one of them will be processed.
python vid2frame.py path/to/my/video.mp4 --db_name my_db.lmdb --db_type LMDB
python vid2frame.py path/to/my/video.mp4 --db_name my_frames --db_type FILE -W 512 -H 512
python vid2frame.py path/to/my/video.mp4 --db_name my_frames --db_type FILE -W 512 -H 512 --no_duplicates 0.98 --hash_size 32
test_db.py
provides sample code to iterate, read and decode frames in databases, it also checks for broken images.
usage: test_db.py [-h] [--db_name DB_NAME] [--db_type {LMDB,HDF5,FILE,PKL}]
optional arguments:
-h, --help show this help message and exit
--db_name DB_NAME The database to store extracted frames
--db_type {LMDB,HDF5,FILE,PKL}
Type of the database
- Opening images from string buffer:
img = Image.open(BytesIO(v))
- Reading string from HDF5 db:
s = np.asarray(db_vid[fid]).tostring()
python test_db.py --db_name frames-1.lmdb --db_type LMDB
The script outputs the number of frames in the database and their sizes. As well as showing the last frame in the db and the time to iterate over whole database.
- Python 3.7
- FFmpeg: Install on Ubuntu. Other platforms.
- Python libraries:
pip install -r requirements.txt
,
-
RuntimeError: Unable to create link (name already exists)
This is caused by writing duplicate frames to a non-empty HDF5 database.