Please set up a python virtual environment and install the packages listed in requirements.txt.
# set up a python3 virtualenv environment
pip install -r requirements.txt
- Use youtube-dl to download
a video.
- Install youtube-dl. e.g.
pip install youtube-dl
. - Use
youtube-dl -F <youtube URL>
to list all formats, and choose a format id (e.g. 96, 137, best). - If it is a live video, use the following command
ffmpeg -i $(youtube-dl -f {format_id} -g <youtube URL>) -c copy -t 00:20:00 {VDIEONAME}.ts
- if not live video, run
ffmpeg -i $(youtube-dl -f {format_id} -g <youtube URL>) -c copy {VIDEONAME}.mp4
- Install youtube-dl. e.g.
- Or use streamlink to download a video.
- Use ffmpeg to transform and extract the
frames e.g.
ffmpeg -i <Video Filename> %06d.jpg -hide_banner
- The DNN models used in this project are downloaded from Tensorflow Model Zoo.
- Models used in this project:
- Labels are in COCO
format. Labels used in this project:
- 1 person
- 3 car
- 6 bus
- 8 truck
- The object detection results can be generated using object_detection.
The video dataset wrapper and frame extract scripts are in videos
Please check here.
- Object Speed
- Object Size
- Percentage of Frames with Objects
- Object Arrival Rate
- Total Object Size
- VideoStorm paper,
Implementation
VideoStorm tunes video frame rate, frame resolution and model complexity to save the GPU computing cost required in video analytics tasks. It uses offline profiling techniques to choose wise configurations. A scheduling algorithm is provided to coordinate jobs across multiple machine. - Glimpse paper,
Implementation
Glimpse client sends selected frames to Glimpse server for object detection, and runs tracking on unselected frames in order to save GPU computing cost. Glimpse selects frames by measuring the pixel difference across frames and tracks objects using optical flow. - NoScope paper,
Implementation
NoScope uses cheap and specialized models at the client side and only send the undetermined frames frames to the server for golden model inference. - Vigil paper,
Implementation
Vigil uses the outputs of a simple model on the client side to crop out useful regions. It only encodes the useful regions to send to server for inference. This saves the bandwidth of video transmission. - AWStream paper,
Implementation
AWStream tunes video frame rate, frame resolution and quality parameter to save the bandwidth required in video transmission. It uses offline and online profiling techniques to choose configurations to save bandwidth and maintain inference accuracy.