descriptinc / cargan

Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"

Home Page:https://maxrmorrison.com/sites/cargan

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Poor results on Mandarin singing voice data

WelkinYang opened this issue · comments

Thank you for your work. I used this repository to experiment on a Mandarin singing voice dataset, the training result of 50w steps is not satisfactory, the main problem is that the spectrum looks like stitching together one by one Chunk, there are very obvious vertical line streaks(can be clearly heard).
image
image

I am using the default hyperparameter configuration, how should I avoid this problem?

Boundary artifacts may arise for some types of data. This is mentioned in the paper. If you figure out how to address those artifacts, we'd love to know as well.

My guess is that the better the GAN can fit the data distribution the less likely the artifacts. So more data, more steps, and simpler (e.g., denoised, dereverbed) data will probably help.