keijiro / Dcam

Real time Stable Diffusion camera

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dcam

GIF

Dcam is an experimental project that applies the Stable Diffusion image-to-image process to a real-time video input.

Dcam uses Apple's Core ML port of Stable Diffusion (ml-stable-diffusion) to run the pipeline on macOS. It also uses the NDI protocol to stream video and controller inputs from iPhone to Mac over LAN.

I used this project in Channel 23 and other events for concert visuals.

System requirements

  • Unity 2023.2
  • macOS 14 Sonoma
  • Mac computer with many GPU cores (M2 Max or later models are recommended)
  • iPhone with triple lenses (iPhone 13 Pro or later models)
  • Network connection (or USB connection; see below)

You can connect an iPhone and a Mac with a USB cable to establish a VLAN between them, which is handy for keeping a robust connection in a concert venue.

About the Stable Diffusion model

This project uses a Core ML Stable Diffusion model with a landscape aspect ratio (640x384). You can download the model from the Hugging Face repository below.

https://huggingface.co/keijiro-tk/coreml-stable-diffusion-2-1-base-640x384

About the latency

The Stable Diffusion image-to-image process has a few seconds of latency, even on a powerful Mac computer. I tried hiding this latency by inserting flipbook-like effects.

Why Mac/iPhone?

You can reduce the latency by using a PC with a high-performance GPU, but bringing a bulky and heavy PC into a venue is troublesome. I prefer PCs for research purposes but MacBooks for on-site work.

iPhone is handy for video input and remote control. I can connect them using a USB extender cable (USB repeater) and establish a robust NDI connection.

About

Real time Stable Diffusion camera


Languages

Language:C# 81.7%Language:ShaderLab 14.5%Language:HLSL 3.8%