Picovoice / orca

On-device streaming text-to-speech engine powered by deep learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Orca invalid characters issue

lyonsno opened this issue · comments

Hey Picovoice team. I’m a big fan of your mission and I’m enjoying playing with the python sdk for the orca beta, but I’ve been encountering issues with certain common characters resulting in utf-8 decoding exceptions being thrown.

there may be more, but so far I’ve noticed that:

Any integers, even when wrapped in strings, - “ I need 10 bananas”
Colons - ‘:’
Semi-colons - ‘;’
Dashes - ‘-‘
Underscores - ‘_’
Newlines - ‘\n’ or ‘\n’
And - I believe, but need to double check, quotation marks, even when wrapped in single quotes - I.e 'Susan said "hi" to Bob'

(Ignore the fancy quotes, I’m typing this on a phone, they don’t exist in my actual inputs)

Will all cause orca.synthesize_to_file() to throw invalid Unicode decoding errors. This happens in python programs, your cli, a cli I wrote, and in the python interpreter itself.

Is this a known limitation? Am I doing something wrong? Looking forward to hearing from you, I’d love to be able to use this as a solution for my project.

Hi @lyonsno ,
thanks for your report.
For our beta version we have limited support for input characters. More detailed information can be found in our docs, and we will update our README to make this more clear.
We support lower-case and upper-case letters and 6 punctuation symbols that can be retrieved by calling valid_punctuation_symbols: [".", ":", ",", "\"", "?", "!"].
For all other inputs, you can use custom pronunciations, e.g.:
"I need {ten|T EH N} bananas"

Can you please let us know which platform you are running on? Please use our issue templates in the future.
We will investigate why an invalid Unicode error is thrown, instead of an error message.

We were able to identify an issue with the Python SDK error reporting and are working on an update.

We are aware of some of the limitations of this beta version of Orca and are working on several improvements for the main release; expanding the number of supported characters, as well as making the model faster are on the roadmap.

Hi @lyonsno ,
we released a patch for the python version that includes a few fixes (fix error reporting, support for hyphens ("-"), better pronunciations). You might want to give it a try.

We are always happy to see people building on top of our products. Feel free to create a fork of the orca repo and link it here.

+1 on this.

I'm building an end-to-end voice assistant that dispatches voice commands gathered via Cheetah to ChatGPT (or compatible models) and uses Orca to render the response as audio.

Special punctuation characters aren't a big deal. Since I already have a text transcript, I can easily replace hyphens/underscores with spaces, colons/semicolons/newlines with periods etc.

But numbers are quite a problem - a response is quite likely to contain digits/numbers in some form.

A workaround for now may be to just pre-process the string via a regex that replaces all digits with their num2words representation before feeding it to Orca. But this would only work for Python - and it assumes that the client knows the target language.

Are there any plans to also handle numbers? If so, then I can wait for a new release. Otherwise I may go for the num2words+regex workaround in my implementation. I could also prepare a PR that pre-processes numbers in the text before Orca processes it, but my num2words workaround would only work for the Python bindings then.

Hi @blacklight ,
thanks for the feedback. We are working on a release that will support handling more types of inputs, including numbers. Keep an eye open for our next release.