Video to frames processing with Pachyderm

This project demonstrates how to use Pachyderm for distributed video processing. For this tutorial, we'll upload Seals.mpg, a 3 second video of a seal, to a input Pachyderm repo. Next we'll develop a Pachderm pipeline which will transform the video into frames and output 76 jpg images in an output Pachyderm repo.

Important Notes

This example of video processing mirrors was based off the following Beginner Tutorial. Use that tutorial as a resource for a how-to guide on how this project was created.

This example is very minimal as it processes a single video, which doesn't serve to highlight the true power of Pachyderm. Pachyderm shines in the instance where users need to process thousands of videos and organize their frame outputs with parallel processing. To learn about how to extend this example into a more real-world use case please pay attention to the following documentation:

Additionally, please reach out for help on the Pachyderm Slack Community.

File Overview

frames.py uses OpenCV, a popular Python library for computer vision, to split a video into frames. This code was largely borrowed from this repo.
fames.json creates the Pachyderm pipeline for this distribued video processing.
The Dockerfile was used to create the anaisdg/opencv image which is used to process each video. The anaisdg/opencv image uses the jjanzic/docker-python3-opencv image which is based off of the official Python 3 image and has an additon of OpenCV. This Dockerfile is not required to run the project, but serves as an example for those wanting to build their own project and pipeline.
frames_solo.py and requirements.txt were used during the development of this project to ensure that the video processing script was successful. They are not required to run this project.

Prerequisites

Install and deploy Pachyderm community edition locally. Follow this Getting Started guide to get up and running.
Install Docker Desktop locally.
Install the Pachyderm CLI
Install Kubernetes CLI

Execute

Create an input repo with pachctl create repo videos.
Verify that the repo creation was successful with pachctl list repo or through the UI.
Add Seal.mpg to the input repo with pachctl put file videos@master:Seal.mpg -f ./Seal.mpg
Verify that the data addition to videos was succesful with pachctl list file videos@master or through the UI.
Create a pipeline with pachctl create pipeline -f frames.json.
Verify that the distributed video processing and output repo, images, was successful with pachctl list job or thorugh the UI.

Debugging Tips

View the logs for each job in the UI or with
Run kubectl get pods to view the status of each pod.
Run pachctl list pipeline to view the status of each pipeline.

Anaisdg / pachyderm