Xingtao / FFdynamic

Library with dynamic audio/video composition and runtime control

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status

FFdynamic - Extend FFmpeg with run time control and dynamic audio/video composition

This project shipped with two parts: FFdynamic library and applications build on FFdynamic


Contents


An application Interactive Live

Interactive Live (ial for short) is an application based on FFdynamic.
Ial does multiple video and audio input mixing, then streams it out. It could be run on phones or cloud servers.

ial gives flexiable control over the mixing process (dynamical layout change, backgroup picture change, mute/unmute, etc..), shown in the following gif:

Layout auto change or set as request

Layout auto change or set to certain pattern during mixing by request

This picture shows, 1. auto layout change when a new stream joining in (from 2 cells to 3 cells); 2. layout changes to 4 and 9 cells by http requeset. Changes are quite smooth, without any frozen or stuck, due to audio/video sync message communication mechnism.

For more details(capacities, http apis, etc..), please refer to the application


Dynamic Detect example

Dynamic Detect is a playgroud one can change object detector types at run time while reading online video streams or local files. Those detectors are loaded via OpenCV api. Models of darknet yolo3, caffe vgg-ssd, and tensorflow mobilenet-ssd (all in coco dataset) are tested. Here is an output stream gif, which run 2 detecors in parallle, draw boxes and texts when they locate interested objects.

draw_detection

For more details, please refer to the application and its unit test


FFdynamic library Overview

  • Extending: FFdynamic extends FFmpeg in the manner of doing video/audio process compositionally and each component's state can be dynamically changed on the fly.
  • compositional FFdynamic is structured in a modular way. It takes each component (demux, decode, filter, encode, muxer, and cutomized component) as a building block and they can be combined together as desired at creationg time or run time.
    For instance, if we are developing a dehaze algorithm and would like to know how good the dehazed algorithm visually (in compare to original one). FFdynamic provides facilities that allow one to easily realize following composition:
Demux |-> Audio Decode -> |-> Audio Encode ------------------------------------------> |
      |                                                                                | -> Muxer
      |                   |-> Dehaze Filter -> |                                       |
      |-> Video Decode -> |                    | Mix original and dehazed ->| Encode ->|
                          | -----------------> |

As shown, after demux the input stream, we do video decode which will output to two components: 'Dehaze Filter' component and 'mix video' component; after dehaze, its image also output to 'mix video' component, in there we mix original and dehazed image into one. The whole example is here. Normally, one can freely combine components as long as the input data can be processed.

  • on the fly FFdynamic has a runtime event dispatch module, which can pass request to the component needs dynamical state change. For instance, one could set dynamical 'Key Frame' request to video encoder or 'mute' one audio stream.
    FFdynamic also has a runtime components pub-sub module, which each component can subscribe interesed events from other components. For instance, one video encoder in a live show is willing to know the region of people faces in the incoming image', so that it could set more bitrate to this region. We can do this by subscribe events to a face detecting component and get published event with ROI.
  • customization One can define their own components, for instance

    • a RTP demuxer with private fields
    • an object detection module
    • a packet sending control module

    Those components are plugins. Once they are done, they can be composed with other components.

In short, FFdynamic is a scaffold allows develop complex audio/video application in a higher and flexiable manner.
It is suitable for two kind of applications:

  • real time audio/video process: live broadcast, video conference backend, transcoding, etc.. with run time control;
  • develop new video/audio process algorithm which needs video clips as inputs and video clips as outputs, and communication or coorperation needed between video and audio streams;

Do transcoding in a dozen lines of code, see here


Here we take the 'dehaze', mentioned in the 'Overview' part, as the example. We developed a dehaze algorithm and make it a FFdynamic's component. Then mix original and dehazed image together to check the result visually.

Refer to here for plugin source files.


Installation

Dependency Required

  • FFMpeg, glog, cmake (minimal version 3.2)
  • compiler supports at least c++11
  • boost, protobuf3 (optional, only for the application 'Interactive Live')
  • opencv (optional, if you would like to run plugin example)

protobuf3 is not well supports by some linux distributions' package manager, here is a manually compiling script (sudo required):

DIR=$(mktemp -d) && cd ${DIR} && \
git clone https://github.com/protocolbuffers/protobuf.git && cd protobuf && \
git submodule update --init --recursive && \
./autogen.sh && ./configure && \
make && make check && \
sudo make install && sudo ldconfig

For Ubuntu / CentOS

Install FFmpeg as usal, then

apt install -y cmake3 libgflags-dev libgoogle-glog-dev libboost-all-dev
or 
yum install -y glog-devel gflags-devel cmake3 boost-devel  

For Mac

Install FFmpeg as usal, then
brew install cmake glog gflags protobuf boost

Docker build

To alleviate the build process, there is a docker with all dependencies installed that you can play with.

Build library and apps

    Under FFdynamic folder: 
          'sh build.sh'  will build FFdynamic library (need sudo if make install)
    Under app/interactiveLive folder: 
          'sh build.sh'  will build FFdynamic library and ial program.

Optional Dependencies - TODO

  • nvidia driver, cuda tookit, if you prefer using nvidia codec

Contribution

All contributions are welcome. Some TODOs:

  • A webui that can easily operate on Interactive Live application;
  • 'Interactive live' set video cell's border line, border color;
  • Statistics for each component, process time, latency time, detailed info, etc..;
  • An auto data format negotiate module between components;
  • Expose event register to user for easily event/data callback process;

About

Library with dynamic audio/video composition and runtime control

License:MIT License


Languages

Language:C++ 91.9%Language:CMake 5.2%Language:C 1.4%Language:Python 1.1%Language:Shell 0.3%Language:Makefile 0.2%Language:Lua 0.0%