Yoda (A Video analytic pipeline Benchmark)

Dependency

Please set up a python virtual environment and install the packages listed in requirements.txt.

# set up a python3 virtualenv environment
pip install -r requirements.txt

Use youtube-dl to download a video.
1. Install youtube-dl. e.g. pip install youtube-dl.
2. Use youtube-dl -F <youtube URL> to list all formats, and choose a format id (e.g. 96, 137, best).
3. If it is a live video, use the following command
  ffmpeg -i $(youtube-dl -f {format_id} -g <youtube URL>) -c copy -t 00:20:00 {VDIEONAME}.ts
4. if not live video, run
  ffmpeg -i $(youtube-dl -f {format_id} -g <youtube URL>) -c copy {VIDEONAME}.mp4
Or use streamlink to download a video.
Use ffmpeg to transform and extract the frames e.g. ffmpeg -i <Video Filename> %06d.jpg -hide_banner

The DNN models used in this project are downloaded from Tensorflow Model Zoo.
Models used in this project:
- Golden Model: faster_rcnn_resnet101_coco
- faster_rcnn_inception_v2_coco
- ssd_mobilenet_v2_coco
Labels are in COCO format. Labels used in this project:
- 1 person
- 3 car
- 6 bus
- 8 truck
The object detection results can be generated using object_detection.

The video dataset wrapper and frame extract scripts are in videos

Please check here.

VideoStorm paper, Implementation
VideoStorm tunes video frame rate, frame resolution and model complexity to save the GPU computing cost required in video analytics tasks. It uses offline profiling techniques to choose wise configurations. A scheduling algorithm is provided to coordinate jobs across multiple machine.
Glimpse paper, Implementation
Glimpse client sends selected frames to Glimpse server for object detection, and runs tracking on unselected frames in order to save GPU computing cost. Glimpse selects frames by measuring the pixel difference across frames and tracks objects using optical flow.
NoScope paper, Implementation
NoScope uses cheap and specialized models at the client side and only send the undetermined frames frames to the server for golden model inference.
Vigil paper, Implementation
Vigil uses the outputs of a simple model on the client side to crop out useful regions. It only encodes the useful regions to send to server for inference. This saves the bandwidth of video transmission.
AWStream paper, Implementation
AWStream tunes video frame rate, frame resolution and quality parameter to save the bandwidth required in video transmission. It uses offline and online profiling techniques to choose configurations to save bandwidth and maintain inference accuracy.