healthonrails / annolid

An annotation and instance segmentation-based multiple animal tracking and behavior analysis package.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--extract_frames argument miscounts

shamavir opened this issue · comments

When executing annolid/main.py -v 190116.wmv --extract_frames=20 --algo=uniform, 40 frames are extracted as JPEGs, rather than the expected 20 frames. This is on Windows 10 / Anaconda.

It seems that the total number of frames obtained by OpenCV int(cap.get(7)) is 26955 but the actual frames are 52395 for the video.
These numbers are not reliable for different videos and formats. So the extracted number of frames may not be the exact number as input for some videos.

Hm, is this a bug in OpenCV that should be pushed up, or is this because video files created by different people using different tools contain wrong metadata about themselves? If the latter, could we write a script that fixes at least this one issue by determining the actual number of frames in the video and updating the metadata accordingly?

This problem persists in the current build. In the attached example, specifying 10 frames to be extracted seems to extract 5 frames (instead of 10) based on runtime output, and actually extracts six frames (including frame 0). (The video specified has been uploaded to a new Debugging folder on Cornell Box). Please see anaconda shell screenshot attached.

Annolid error 01, 17 Sept 2020

There is no nb_frames metadata in the header for this video novelctrl.mkv.
{'index': 0,
'codec_name': 'h264',
'codec_long_name': 'H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10',
'profile': 'High',
'codec_type': 'video',
'codec_time_base': '1/60',
'codec_tag_string': '[0][0][0][0]',
'codec_tag': '0x0000',
'width': 1280,
'height': 1024,
'coded_width': 1280,
'coded_height': 1024,
'has_b_frames': 2,
'sample_aspect_ratio': '0:1',
'display_aspect_ratio': '0:1',
'pix_fmt': 'yuv420p',
'level': 32,
'chroma_location': 'left',
'field_order': 'progressive',
'refs': 1,
'is_avc': 'true',
'nal_length_size': '4',
'r_frame_rate': '30/1',
'avg_frame_rate': '30/1',
'time_base': '1/1000',
'start_pts': 0,
'start_time': '0.000000',
'bits_per_raw_sample': '8',
'disposition': {'default': 1,
'dub': 0,
'original': 0,
'comment': 0,
'lyrics': 0,
'karaoke': 0,
'forced': 0,
'hearing_impaired': 0,
'visual_impaired': 0,
'clean_effects': 0,
'attached_pic': 0,
'timed_thumbnails': 0},
'tags': {'ENCODER': 'Lavc58.54.100 libx264',
'DURATION': '00:05:26.133000000'}}

So OpenCV used the duration in seconds * FPS to calculated the number of frames.
The following command will return the video's time duration in seconds.
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 novelctrl.mkv
326.133000 * 30 (FPS) = 9783.99
Which is the same as the method

import cv2
cap = cv2.VideoCapture(video_path)
n_frames = int(cap.get(7))
n_frames = 9784

However, FFMPEG returns the 6834 frames with the following command. Thats why only 6 frames were saved.
Should we require users to install ffmpeg to double count the frames?

ffmpeg -discard nokey -i novelctrl.mkv -map 0:v:0 -c copy -f null -

ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
built with Apple LLVM version 7.0.2 (clang-700.1.81)
configuration: --prefix=/usr/local/Cellar/ffmpeg/3.4 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Input #0, matroska,webm, from '/Users/chenyang/Downloads/novelctrl.mkv':
Metadata:
ENCODER : Lavf58.29.100
Duration: 00:05:26.13, start: 0.000000, bitrate: 2989 kb/s
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x1024, 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
Metadata:
ENCODER : Lavc58.54.100 libx264
DURATION : 00:05:26.133000000
Output #0, null, to 'pipe:':
Metadata:
encoder : Lavf57.83.100
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1280x1024, q=2-31, 30 fps, 30 tbr, 1k tbn, 1k tbc (default)
Metadata:
ENCODER : Lavc58.54.100 libx264
DURATION : 00:05:26.133000000
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
frame= 6834 fps=0.0 q=-1.0 Lsize=N/A time=00:05:26.00 bitrate=N/A speed=2.37e+03x
video:118961kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Solved by reservoir sampling.
Reference:
https://en.wikipedia.org/wiki/Reservoir_sampling

OK, great. I'll give it a try. We'll definitely need to deal with this situation, given that users will have different environments and apparently many video files lack nb_frames metadata.

Why can we not just use the cv2 method, which seems more reliable, to count frames if there is no nb_frames value?

I'm intrigued by your reservoir sampling solution. This won't be really "uniform" sampling through the file, though, will it? More a form of random selection?

I don't think we should require users to install ffmpeg for this reason. Especially if it is giving incorrect information anyway.

OpenCV can only read 6834 frames from the video. I have not figured out the exact reasons.
Yes, I will change the "uniform" to random.