7. Train the final models

Question

7. Train the final models

jpc opened this issue 2 years ago · comments

Once all the bugs are ironed out (#4), we have a text to semantic model (#9), we improve the speech codec (#10) and we have more high-quality data (#11) we will train final models that should match (or even exceed) the quality Google showed in their SPEAR TTS demo page.

Joost van Berkel · Answer 1 · Fri Apr 14 2023 19:34:53 GMT+0800 (China Standard Time)

Hi @jpc, How's the process going? I've been following you for a while now. Do you have achieved some cool results already? Looking forward!

152334H · Answer 2 · Sat Apr 15 2023 15:53:40 GMT+0800 (China Standard Time)

just fyi this zolero person is looking to voraciously monetise your models. You may want to release checkpoints under some non-commercial license if that bothers you.

Jakub Piotr Cłapa · Answer 3 · Mon Apr 17 2023 19:47:08 GMT+0800 (China Standard Time)

@zolero Have you seen the new JFK speech resynthesized from just text (in the "wrong" voice for now) in the README? We are working on multi-speaker support and on scaling the models so in the next two weeks we should show a lot better results.

@152334H Thanks for the heads up but at Collabora we are into Open Source exactly because other people can also benefit from our work. We'd love to support them with a commercial contracts but it's not in our mission to stop them using non-commercial licenses or by switching to an open-core model.

Jakub Piotr Cłapa · Answer 4 · Tue Jan 09 2024 20:34:20 GMT+0800 (China Standard Time)

We are no longer embarrassed by the quality of the models so we've achieved the MVP stage. :)

But stay tuned, we'll continue improving models, performance, controllability, API, etc.