csdcorp / speech_to_text

A Flutter plugin that exposes device specific text to speech recognition capability.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flutter Web, Android Chrome, speech to text returns spoken words multiple times

longtimedeveloper opened this issue · comments

Hi,

I have a Flutter web app that runs flawlessly on Windows Chrome.

When I run the same app on my Android Galaxy S20 FE, I can speak 5 words but speech_to_text returns 5, 10, 15, or 20 words. It repeats them. Some words are repeated 10 times.

Is there anything I can do to overcome this?

Thank you!

hmm...that's distressing. Which version of STT are you using? Could you try 6.3.0 and see if it has the same behaviour? I made a change recently because I had misinterpreted the Web speech API. That change could easily lead to duplicates if the web speech API is implemented differently (incorrectly?) on different devices.

@sowens-csd I have created an Android version of the app and ran on my same Android device. STT words perfect there. Well except for the horrible Android support for the pause delay. The Android pause delay is why I chose to create a web app.

I will try 6.3.0 tonight and let you know.

@sowens-csd I just tried 6.3.0. It did not work at all. No words recognized, ever. SpeechRecognitionResult result.recognizedWords is always an empty string after speaking.

Thanks for trying it. Does the example app also show duplicates on the Galaxy Android Chrome? Do you have access to another android mobile device to test on?

@sowens-csd no I only have this device. I am currently a volunteer in Ukraine near the frontlines. If you can upload the sample app to a website, and send me the link, I will try it for you, no problem.

I really hope you can sort out the duplicates issue. Sometimes I get 2 or 3 duplicates, then other times 20 or more duplicates for the same words spoken.

Thank you very much!

Can you give me a couple of examples of what you say vs. what you receive? I'd like to understand what happens for a single word vs. a phrase. If you say one word, I assume you get multiple copies of that word? If you say a phrase do you get each word in the phrase duplicated or the whole sentence duplicated or other? Also, are their alternates in the results or just a single value with duplicates in that value?

For example:

Say: hello
Receive: ["hello hello hello"]

Say: where are you
Receive: ["where are you where are you where are you"]

@sowens-csd no problem, very happy to provide screen shots and the word or words I spoke to you tomorrow.

I have been up since 0200 and it's 2100 now. They keep firing artillery at us too. Another crazy night in Kherson.

First thing tomorrow morning. Thank you!!

Not sure if this qualifies as good news but I have managed to reproduce the issue on a Galaxy S9. I'll have a look at it and see what's happening.

Hoping for a more peaceful time for you, no one should have to go through that.

Started looking into this, some useful information, looks like there is a known issue with Chrome on Android duplicating words though there isn't much info about it. However, some things to explore.
https://stackoverflow.com/questions/35112561/speech-recognition-api-duplicated-phrases-on-android

@sowens-csd if I speak a single word, it does not get duplicated.

In this session I spoke "hello" twice.

spoke hello twice

In this session, I spoke "glory" three times.

spoke glory 3 times two

In this session, I spoke "glory" three times.

spoke glory three times

In this session, I spoke "glory" three times.

went crazy

Thanks a lot for this. I'm seeing similar results, which is good as I can more easily troubleshoot. I originally made the change in response to PR #436. The issue there was that on the desktop when the user pauses speaking the recognition result splits the response into two phrases. So the plugin now accounts for that and concatenates multiple phrases into a single result.

On Android web the interaction is different. It doesn't seem to split into separate phrases on a pause and it seems to be returning multiple phrases with duplicates of the spoken content as it processes. The good news is that the correct complete result does seem to be in the set of phrases. I just need to find a clean way to throw out the junk.

I just pushed some changes to the repo that I think might resolve it. When you have a chance could you try the repo version? Here's the initialization change you can use for mobile browsers.

      var hasSpeech = await speech.initialize(
          onError: errorListener,
          onStatus: statusListener,
          options: [SpeechToText.webDoNotAggregate]);

Since the mobile and desktop browsers respond differently and the plugin can't tell which it is running on you have to tell it. I think that using something like device_info_plus you could find out the browser type and optionally include this? Let me know how it works on your device.

@sowens-csd OK. I can run the sample in the repo on my device.

@sowens-csd I tried the example app. It works on Desktop Chrome.

However, on the Chrome Android, after pressing the Start button, nothing happens and it returns to its available state. In other words, it will not take input.

I have tried everything I can think of, but I can't get it to work.

Yes for the website on Chrome, I added in the new setting options: [SpeechToText.webDoNotAggregate]);

How odd. I tried it on Chrome Android and got good results. You're getting no errors? I don't think what I did would affect whether it takes results or not, just how it reports them.

What happens if you don't set that option?

@sowens-csd with or without the option I'm getting the same results. Maybe something else caused the problem. There is no way for me to debug a website on a device. I have to run in release mode.

If it is working for you, maybe best to publish the package and I'll use the package and get the same results you are getting.

I did update the flutter and flutter SDK in the pubspec.yaml file to the latest versions of Flutter that was just release. I noticed the example app has older versions in the pubspec.yaml. Not sure if this is the source of the problem.

I have no way of seeing errors on my Android device, so I am not sure what is happening.

I wish I could be more help.

name: speech_to_text
description: A Flutter plugin that exposes device specific speech to text recognition capability.
version: 6.4.1
homepage: https://github.com/csdcorp/speech_to_text

environment:
sdk: '>=3.2.0 <4.0.0'
flutter: '>=3.10.0'

dependencies:
flutter:
sdk: flutter
speech_to_text_platform_interface: ^2.1.0
speech_to_text_macos: ^1.0.2
json_annotation: ^4.0.0
clock: ^1.0.1
pedantic: ^1.9.2
flutter_web_plugins:
sdk: flutter
meta: ^1.1.7
js: ^0.6.3

dev_dependencies:
flutter_test:
sdk: flutter
build_runner: ^2.4.4
json_serializable: ^6.7.0
fake_async: ^1.3.1
mockito: ^5.4.1
plugin_platform_interface: ^2.1.4
flutter_lints: ^3.0.0

flutter:
plugin:
platforms:
android:
package: com.csdcorp.speech_to_text
pluginClass: SpeechToTextPlugin
ios:
pluginClass: SpeechToTextPlugin
web:
pluginClass: SpeechToTextPlugin
fileName: speech_to_text_web.dart
macos:
default_package: speech_to_text_macos

name: speech_to_text_example
description: Demonstrates how to use the speech_to_text plugin.
version: 1.1.0
publish_to: 'none'

environment:
sdk: '>=3.2.0 <4.0.0'

dependencies:
flutter:
sdk: flutter

speech_to_text:
path: ../
provider: ^6.0.5

dev_dependencies:
flutter_test:
sdk: flutter
flutter_lints: ^3.0.0

The following section is specific to Flutter.

flutter:
uses-material-design: true

assets:
- assets/sounds/speech_to_text_listening.m4r
- assets/sounds/speech_to_text_cancel.m4r
- assets/sounds/speech_to_text_stop.m4r

I opened up remote Chrome developer tools. Each time I press the Start button, I get the error that you see in the image.

It is probably something with the example app and not your package.

I have to go to a funeral for our teammate that was killed this week in an artillery attack.

Capture

I pushed 6.5.0 to pub.dev. Please let me know if you have a chance to try it.

My condolences on the death of your teammate.

I will do this first thing in the morning. Thank you very much for outstanding assistance. Much appreciated.

@sowens-csd The update and adding the options: [SpeechToText.webDoNotAggregate] works perfectly on Android Chrome.

Thank you very much.