jwliu-cc / SVG

Sounding Video Generator (SVG) is the first unified framework for text-guided video-audio generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

  • This is the official repository of Sounding Video Generator (SVG, TMM version, arxiv version), which is the first unified framework for Text-to-Sounding-Video (T2SV) generation, as is known to us.
  • The latest version of the AudioSet-Cap dataset is VALOR-1M, which contains more videos and annotations. The AudioSet-Cap test set could be found at /assets/AudioSet-Cap_test.json.

Sounding Video Samples

Click the picture to jump to play the sounding videos. More sampled videos and audios could be found in assets.

Input Text Generated Result
The grass was green, with blue sky and white clouds, and the wind.
A man in a blue shirt was playing the guitar.
A woman with long hair sang in the room.
A man in a suit and glasses speaks indoors.
In the game, a yellow car roars along the road.
In the music, white text plays in front of a black background.

About

Sounding Video Generator (SVG) is the first unified framework for text-guided video-audio generation.

License:Apache License 2.0