Orca invalid characters issue

Question

Orca invalid characters issue

lyonsno opened this issue 6 months ago · comments

Hey Picovoice team. I’m a big fan of your mission and I’m enjoying playing with the python sdk for the orca beta, but I’ve been encountering issues with certain common characters resulting in utf-8 decoding exceptions being thrown.

there may be more, but so far I’ve noticed that:

Any integers, even when wrapped in strings, - “ I need 10 bananas”
Colons - ‘:’
Semi-colons - ‘;’
Dashes - ‘-‘
Underscores - ‘_’
Newlines - ‘\n’ or ‘\n’
And - I believe, but need to double check, quotation marks, even when wrapped in single quotes - I.e 'Susan said "hi" to Bob'

(Ignore the fancy quotes, I’m typing this on a phone, they don’t exist in my actual inputs)

Will all cause orca.synthesize_to_file() to throw invalid Unicode decoding errors. This happens in python programs, your cli, a cli I wrote, and in the python interpreter itself.

Is this a known limitation? Am I doing something wrong? Looking forward to hearing from you, I’d love to be able to use this as a solution for my project.

bejager · Answer 1 · Tue Feb 06 2024 03:44:19 GMT+0800 (China Standard Time)

Hi @lyonsno ,
thanks for your report.
For our beta version we have limited support for input characters. More detailed information can be found in our docs, and we will update our README to make this more clear.
We support lower-case and upper-case letters and 6 punctuation symbols that can be retrieved by calling valid_punctuation_symbols: [".", ":", ",", "\"", "?", "!"].
For all other inputs, you can use custom pronunciations, e.g.:
"I need {ten|T EH N} bananas"

Can you please let us know which platform you are running on? Please use our issue templates in the future.
We will investigate why an invalid Unicode error is thrown, instead of an error message.

Noah Lyons · Answer 2 · Tue Feb 06 2024 05:16:02 GMT+0800 (China Standard Time)

I’m running on Mac 2021 m1 16inch OS X Monterey with python 3.11. Must have missed the part about only letters, but I don’t think the colon showed up when I checked that list. I could be miss remembering though. Btw are you guys using gpu as gpu acceleration? If there were a little less latency. Are you planning on expanding the number of supported characters? Being able to handle more characters would be necessary

…

On Mon, Feb 5, 2024 at 11:44 AM bejager ***@***.***> wrote: Hi @lyonsno <https://github.com/lyonsno> , thanks for your report. For our beta version we have limited support for input characters. More detailed information can be found in our docs <https://picovoice.ai/docs/api/orca-python/>, and we will update our README to make this more clear. We support lower-case and upper-case letters and 6 punctuation symbols that can be retrieved by calling valid_punctuation_symbols: [".", ":", ",", "\"", "?", "!"]. Can you please let us know which platform you are running on? Please use our issue templates in the future. We will investigate why an invalid Unicode error is thrown, instead of an okerror message. — Reply to this email directly, view it on GitHub <#10>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFN2ZG5WCI7QEPAHDNSK4DYSEZB7AVCNFSM6AAAAABCY7V5ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRXHEZTGMZTHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bejager · Answer 3 · Tue Feb 06 2024 06:22:52 GMT+0800 (China Standard Time)

We were able to identify an issue with the Python SDK error reporting and are working on an update.

We are aware of some of the limitations of this beta version of Orca and are working on several improvements for the main release; expanding the number of supported characters, as well as making the model faster are on the roadmap.

Noah Lyons · Answer 4 · Thu Feb 08 2024 08:01:05 GMT+0800 (China Standard Time)

Great to hear! I’m especially excited about the upcoming ability to inflect with emotion, and the pronunciation dictionary is a great start towards correcting consistent errors; 'kinda' for example is never pronounced correctly. Speaking of which, I have a decently successful solution implemented for cleaning text input, increasing robustness while maintaining many of the sounds/words/phrases current limits prevent synthesizing. It's a workaround, but it could potentially help people test the feature out. Is there a good place for me to potentially share it? Also, happy to report that after better error handling and a restart, latency (at least on my 16GB 2021 MBP 16") is low enough for indefinite continuous speech, given proper async management. Thanks for your response, Noah Lyons

…

On Mon, Feb 5, 2024 at 2:23 PM bejager ***@***.***> wrote: We were able to identify an issue with the Python SDK error reporting and are working on an update. We are aware of some of the limitations of this beta version of Orca and are working on several improvements for the main release; expanding the number of supported characters, as well as making the model faster are on the roadmap. — Reply to this email directly, view it on GitHub <#10>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFN2ZBCX2SCNOSEQX2HKYLYSFLURAVCNFSM6AAAAABCY7V5ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGI3DONRUGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bejager · Answer 5 · Fri Feb 16 2024 02:35:12 GMT+0800 (China Standard Time)

Hi @lyonsno ,
we released a patch for the python version that includes a few fixes (fix error reporting, support for hyphens ("-"), better pronunciations). You might want to give it a try.

We are always happy to see people building on top of our products. Feel free to create a fork of the orca repo and link it here.

Fabio Manganiello · Answer 6 · Mon Apr 08 2024 18:45:43 GMT+0800 (China Standard Time)

+1 on this.

I'm building an end-to-end voice assistant that dispatches voice commands gathered via Cheetah to ChatGPT (or compatible models) and uses Orca to render the response as audio.

Special punctuation characters aren't a big deal. Since I already have a text transcript, I can easily replace hyphens/underscores with spaces, colons/semicolons/newlines with periods etc.

But numbers are quite a problem - a response is quite likely to contain digits/numbers in some form.

A workaround for now may be to just pre-process the string via a regex that replaces all digits with their num2words representation before feeding it to Orca. But this would only work for Python - and it assumes that the client knows the target language.

Are there any plans to also handle numbers? If so, then I can wait for a new release. Otherwise I may go for the num2words+regex workaround in my implementation. I could also prepare a PR that pre-processes numbers in the text before Orca processes it, but my num2words workaround would only work for the Python bindings then.

bejager · Answer 7 · Tue Apr 09 2024 03:39:23 GMT+0800 (China Standard Time)

Hi @blacklight ,
thanks for the feedback. We are working on a release that will support handling more types of inputs, including numbers. Keep an eye open for our next release.