collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Home Page:https://collabora.github.io/WhisperSpeech/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Map architecture in config.json and tokenizer.json files on HuggingFace

bartekupartek opened this issue · comments

I've experimented with the Bark and your model and I've found your model simpler to follow and lighter than the Bark model, I'd like to port it in the Elixir Bumblebee project. It seems the pipeline.py file, which essentially includes a speaker, text-to-semantic, semantic-to-audio and a vocoder is all I need to adapt to enable TTS in my favorite lang. I've tried to load WhisperSpeech from HuggingFace in Elixir Bumblebee but stuck on begging because of missing required config.json and tokenizer.json and perhaps safetensors files, are you planning to support this or could anyone provide or point the required fields and values? This would help me to load all models natively, another way around would be ONNX runtime but this would create extra overhead in my case.

Hey, I am not sure how the hugging face models are used in Bumblebee. I followed a similar naming convention as Huggingface but the model is implemented from scratch in PyTorch.

ONNXRuntime may work but I think their LLM support (and the architecture is pretty much like an LLM) was just released in most recent version so you may run into some issues.