DanielSWolf / rhubarb-lip-sync

Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to imporve accuracy/natural of lipsync

mi2think opened this issue · comments

commented

I made a video comparison with Oculus Lipsync: Oculus vs Rhubarb. It looks not pleasure. The next step I'll try interpolation between two visemes. Maybe it'll looks more natural after doing this. But I'm doubt I can accomplish it like Oculus.

Take a frame of videos, I found viseme weights as below:

Oculus: Video Percent: 0.0040, Visemes:[0.0218, 0.0004, 0.0001, 0.0005, 0.0009, 0.0004, 0.0001, 0.0001, 0.0334, 0.0001, 0.5889, 0.2065, 0.0023, 0.0065, 0.1380]
Rhubarb: Video Percent: 0.0040, Visemes: E

Oculus use 15 visemes: Viseme Reference, I really do not know how they calculate weights between many visemes, just know they use deep neural network.

So any plan or suggestion on rhubarb lipsync?

commented

Update: Add interpolation between two neighboring visemes: Oculus vs Rhubarb with interpolation

There are multiple reasons why the Rhubarb output looks worse than the Oculus output.

  1. The original recording has a lot of reverb. Rhubarb was developed for dry, high-quality recordings and doesn't deal well with reverb.
  2. Rhubarb uses a much simpler architecture than Oculus without neural networks.
  3. Rhubarb was never meant for 3D animation. In its current state, Rhubarb is optimized for cartoon-style 2D animation.

The best thing you can do to improve results is use a dry recording. But that still won't give you the kind of 3D animation you got with Oculus.