maum-ai / nuwave

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021

Home Page:https://mindslab-ai.github.io/nuwave/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Contribution: checkpoints AVAILABLE!

freds0 opened this issue · comments

Hi guys,
First I would like to thank @junjun3518 for the excellent work of developing and sharing the code. I trained the model following the paper settings for two weeks on a V100 GPU using ratio=2 and 3. I would like to contribute to the project by sharing the checkpoints. Below are the download links.

nuwave x2:
https://drive.google.com/file/d/1pegayKs-i78yWlPuLIp-BCU8KxxCpBzd/view?usp=sharing

nuwave x3:
https://drive.google.com/file/d/12RUMjEALAs0EoEw6Fqf9ZkpTm3COX6sf/view?usp=sharing

The following are images of the training logs.

nuwave x2:

epoch:
epoch

loss:
loss

val loss:
val_loss

nuwave x3:

epoch:
epoch

loss:
loss

val loss:
val_loss

b-a01c-8df0b99c9e0e.svg)

I also ran the test scripts:

nuwave_x2:

Captura de tela_2022-04-12_10-49-08

nuwave_x3:

Captura de tela_2022-04-12_10-48-49

Thank you for your great contribution! I add link of this issue on README.md!

Hey there @junjun3518 and and @freds0. Can I (or @junjun3518, if they'd like) share this model on the Hugging Face Hub?

commented

Thank you for your great contribution! I want to use checkpoint to upsample music. What performance devices do I need? Can a personal laptop run this model?

Yes, it is also runnable with CPU (but very slow).
For the music case, I don't recommend you to use this project since it is only trained with clean speech without music

commented

Yes, it is also runnable with CPU (but very slow). For the music case, I don't recommend you to use this project since it is only trained with clean speech without music

Thanks for getting back to me so quickly. If I use music data to train the model, and then use the model to upsample, is that feasible?

I think that it is different from the instrumental of the source.
For example, it is hard to apply to electronic music, since high-frequency sounds of electronic do not correlate with low-frequency sounds.
For classic instrumental or piano, it is applicable.

Would you say 700 epochs are enough for training?