SilmeLyy / NUWA

A unified 3D Transformer Pipeline for visual synthesis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

This is the official repo for the paper: NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion.

Overview

NÜWA is a unified multimodal pre-trained model that can generate new or manipulate existing visual data (i.e., images and videos) for 8 visual synthesis tasks (as shown above).

Samples

Text-To-Image (T2I)

t2i

SKetch-to-Image (S2I)

s2i

Image Completion (I2I)

i2i

Text-Guided Image Manipulation (TI2I)

ti2i

Text-to-Video(T2V)

t2v

Video Prediction (V2V)

v2v

Sketch-to-Video (S2V)

s2v

Text-Guided Video Manipulation (TV2V)

out_final

About

A unified 3D Transformer Pipeline for visual synthesis