after first wakeup,can't recognizing speech command immediately (AIS-998)

Question

after first wakeup,can't recognizing speech command immediately (AIS-998)

upupycl opened this issue 2 years ago · comments

hello ,i have a problem and don't know how to solve it,could you please give some idea, thanks for your help.

problem description:
I bought a develop board(DIY) on taobao(esp32s3-wroom-1), and use merchant's demo have no problem .
but the esp-sr that demo used is a very old version, so i clone a newest esp-sr and to develop, then the problem occured as follow:

when the board start running, i woke it up for the first time, it can be awakened normally, then i speak the speech command but it had no reaction until time out and needs to wake it up again, when i wake it again and speak the command, it can recognize commads properly, i don't know why i can't recognize speech command after first wakeup.

i test many times and find something maybe be helpful:
when it was woke up ,i add a printf echo the value of channel < printf("channel: %d\n",res->trigger_channel_id); >
first wakeup ,the value of channel is 2, and can't recognize speech command. then the value of channel change to 1 after second wakeup ,and speech command could recognized at this time.

the demo's model initialization code: (the newest esp-sr adopt another initialize method)

`static const esp_wn_iface_t *wakenet = &WAKENET_MODEL;
 static const model_coeff_getter_t *model_coeff_getter = &WAKENET_COEFF;
 model_data = wakenet->create(model_coeff_getter, DET_MODE_95);              //initialize wakenet

static const esp_mn_iface_t *multinet = &MULTINET_MODEL;
model_data_mn = multinet->create(&MULTINET_COEFF, 4000);                     //initialize multinet`

according to the esp-skainet examples and esp-box,my project's model initialization code : (using the afe interface)

 `afe_handle = &ESP_AFE_SR_HANDLE;                                            //initialize  afe
  afe_config_t afe_config = AFE_CONFIG_DEFAULT();
  afe_config.wakenet_model_name = esp_srmodel_filter(models, ESP_WN_PREFIX, NULL);
  afe_data = afe_handle->create_from_config(&afe_config);

  mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_CHINESE);       //initialize  multinet
  printf("multinet:%s\n", mn_name);
  multinet = esp_mn_handle_from_name(mn_name);
  model_data = multinet->create(mn_name, 6000);
  esp_mn_commands_update_from_sdkconfig(multinet, model_data);     // Add speech commands 


  afe_fetch_result_t* res = g_sr_data->afe_handle->fetch(afe_data);          //get audio data and detect
  if (res->wakeup_state == WAKENET_DETECTED) 
  {
       ESP_LOGI(TAG, LOG_BOLD(LOG_COLOR_GREEN) "wakeword detected");
  }
  f (res->wakeup_state == WAKENET_CHANNEL_VERIFIED) 
  {
      ESP_LOGI(TAG, LOG_BOLD(LOG_COLOR_GREEN) "Channel verified");
      printf("channel: %d\n",res->trigger_channel_id);
  }`

Sun Xiangyu · Answer 1 · Tue Sep 27 2022 19:31:34 GMT+0800 (China Standard Time)

I can't reproduce your problem.
When I use esp-skainet/examples/en_speech_commands_recognition, speech command can be recognized for the first time. Maybe your environment is noisy.
The first recognition is indeed different from the later recognitions. As you observed, the channel of first wakeup is 2 which means that the raw microphone data is used for the first commands recognition. Channel 0 and channel 1 is BSS output.
Why we choose raw data for the first time is that the BSS algorithm need to take some time to converge.

upupycl · Answer 2 · Wed Sep 28 2022 11:02:22 GMT+0800 (China Standard Time)

Ok, at firsr, thanks for your reply.
Then, i thought a method to solve this problem,is it possible to achieve?

the method as follow :
I'm going to prepare a audio file(.wav) with a wake word, then when it start up, i input the content of the file through i2s to
wake it up,then maybe it also can't recognized speech commands,but i don't care. after a while, i wake it up through
micphone and maybe it can recognize speech command normally.

can i wake it up through a file instead of microphone, if i can,how should i do. Is there an interface for this

Sun Xiangyu · Answer 3 · Thu Sep 29 2022 15:18:02 GMT+0800 (China Standard Time)

We don't have an existing interface. You can do as below:

You need to read a file from flash or SD card.
Feed the file data by afe->feed() function.
When the file is fed， you can restart to feed I2S data

But I'm not sure if this method is effective. Your file data and I2S data are not contiguous, so BSS need to take some time to re-converge.