This project demonstrates how to use Pachyderm for distributed video processing. For this tutorial, we'll upload Seals.mpg, a 3 second video of a seal, to a input Pachyderm repo. Next we'll develop a Pachderm pipeline which will transform the video into frames and output 76 jpg images in an output Pachyderm repo.
This example of video processing mirrors was based off the following Beginner Tutorial. Use that tutorial as a resource for a how-to guide on how this project was created.
This example is very minimal as it processes a single video, which doesn't serve to highlight the true power of Pachyderm. Pachyderm shines in the instance where users need to process thousands of videos and organize their frame outputs with parallel processing. To learn about how to extend this example into a more real-world use case please pay attention to the following documentation:
Additionally, please reach out for help on the Pachyderm Slack Community.
- frames.py uses OpenCV, a popular Python library for computer vision, to split a video into frames. This code was largely borrowed from this repo.
- fames.json creates the Pachyderm pipeline for this distribued video processing.
- The Dockerfile was used to create the anaisdg/opencv image which is used to process each video. The anaisdg/opencv image uses the jjanzic/docker-python3-opencv image which is based off of the official Python 3 image and has an additon of OpenCV. This Dockerfile is not required to run the project, but serves as an example for those wanting to build their own project and pipeline.
- frames_solo.py and requirements.txt were used during the development of this project to ensure that the video processing script was successful. They are not required to run this project.
- Install and deploy Pachyderm community edition locally. Follow this Getting Started guide to get up and running.
- Install Docker Desktop locally.
- Install the Pachyderm CLI
- Install Kubernetes CLI
- Create an input repo with
pachctl create repo videos
. - Verify that the repo creation was successful with
pachctl list repo
or through the UI. - Add Seal.mpg to the input repo with
pachctl put file videos@master:Seal.mpg -f ./Seal.mpg
- Verify that the data addition to videos was succesful with
pachctl list file videos@master
or through the UI. - Create a pipeline with
pachctl create pipeline -f frames.json
. - Verify that the distributed video processing and output repo, images, was successful with
pachctl list job
or thorugh the UI.
- View the logs for each job in the UI or with
- Run
kubectl get pods
to view the status of each pod. - Run
pachctl list pipeline
to view the status of each pipeline.