iejMac / video2numpy

Optimized library for large-scale extraction of frames and audio from video.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Frame decoding cluster

iejMac opened this issue · comments

Outlining a plan for turning this into a super nice video dataloader. The main functionality missing is to allow the user to take advantage of an imbalance of cheap CPU compute / expensive GPU compute by launching a frame decoding cluster on the CPU cluster and connecting to that cluster on the GPU cluster and simply sending requests for decoded video shards.

Steps in the process

  1. Launch the cluster of N Frame workers. It begins decoding videos into some shared memory structure
  2. Dataloader request data from the cluster
  3. Cluster sends metadata about shards over some link (metadata is fine since small)

What concepts do we need:

  • Video Manager: which videos get decoded and how they get allocated on the mem
  • Frame decoder: how do you decode each video specifically
  • Shared memory data structure: should support variety of options, S3, /fsx, etc.
  • Some schema for how the data is organized in memory: shared frame queue? just shard with ids?
  • Communication: what do we send to the loader?
  • Loader: how does the loader extract pixels from memory + metadata from communication

Video Manager:

  • Important to support things like shuffle buffers and whatnot
  • Calls workers with data

Frame decoder: (already solved here)