espressif / esp-sr

Speech recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

after first wakeup,can't recognizing speech command immediately (AIS-998)

upupycl opened this issue · comments

hello ,i have a problem and don't know how to solve it,could you please give some idea, thanks for your help.

  • problem description:
    I bought a develop board(DIY) on taobao(esp32s3-wroom-1), and use merchant's demo have no problem .
    but the esp-sr that demo used is a very old version, so i clone a newest esp-sr and to develop, then the problem occured as follow:

when the board start running, i woke it up for the first time, it can be awakened normally, then i speak the speech command but it had no reaction until time out and needs to wake it up again, when i wake it again and speak the command, it can recognize commads properly, i don't know why i can't recognize speech command after first wakeup.

  • i test many times and find something maybe be helpful:
    when it was woke up ,i add a printf echo the value of channel < printf("channel: %d\n",res->trigger_channel_id); >
    first wakeup ,the value of channel is 2, and can't recognize speech command. then the value of channel change to 1 after second wakeup ,and speech command could recognized at this time.

  • the demo's model initialization code: (the newest esp-sr adopt another initialize method)

    `static const esp_wn_iface_t *wakenet = &WAKENET_MODEL;
     static const model_coeff_getter_t *model_coeff_getter = &WAKENET_COEFF;
     model_data = wakenet->create(model_coeff_getter, DET_MODE_95);              //initialize wakenet
    
    static const esp_mn_iface_t *multinet = &MULTINET_MODEL;
    model_data_mn = multinet->create(&MULTINET_COEFF, 4000);                     //initialize multinet`                    
    
  • according to the esp-skainet examples and esp-box,my project's model initialization code : (using the afe interface)

     `afe_handle = &ESP_AFE_SR_HANDLE;                                            //initialize  afe
      afe_config_t afe_config = AFE_CONFIG_DEFAULT();
      afe_config.wakenet_model_name = esp_srmodel_filter(models, ESP_WN_PREFIX, NULL);
      afe_data = afe_handle->create_from_config(&afe_config);
    
      mn_name = esp_srmodel_filter(models, ESP_MN_PREFIX, ESP_MN_CHINESE);       //initialize  multinet
      printf("multinet:%s\n", mn_name);
      multinet = esp_mn_handle_from_name(mn_name);
      model_data = multinet->create(mn_name, 6000);
      esp_mn_commands_update_from_sdkconfig(multinet, model_data);     // Add speech commands 
    
    
      afe_fetch_result_t* res = g_sr_data->afe_handle->fetch(afe_data);          //get audio data and detect
      if (res->wakeup_state == WAKENET_DETECTED) 
      {
           ESP_LOGI(TAG, LOG_BOLD(LOG_COLOR_GREEN) "wakeword detected");
      }
      f (res->wakeup_state == WAKENET_CHANNEL_VERIFIED) 
      {
          ESP_LOGI(TAG, LOG_BOLD(LOG_COLOR_GREEN) "Channel verified");
          printf("channel: %d\n",res->trigger_channel_id);
      }`
    

I can't reproduce your problem.
When I use esp-skainet/examples/en_speech_commands_recognition, speech command can be recognized for the first time. Maybe your environment is noisy.
The first recognition is indeed different from the later recognitions. As you observed, the channel of first wakeup is 2 which means that the raw microphone data is used for the first commands recognition. Channel 0 and channel 1 is BSS output.
Why we choose raw data for the first time is that the BSS algorithm need to take some time to converge.

Ok, at firsr, thanks for your reply.
Then, i thought a method to solve this problem,is it possible to achieve?

the method as follow :
I'm going to prepare a audio file(.wav) with a wake word, then when it start up, i input the content of the file through i2s to
wake it up,then maybe it also can't recognized speech commands,but i don't care. after a while, i wake it up through
micphone and maybe it can recognize speech command normally.

can i wake it up through a file instead of microphone, if i can,how should i do. Is there an interface for this

We don't have an existing interface. You can do as below:

  1. You need to read a file from flash or SD card.
  2. Feed the file data by afe->feed() function.
  3. When the file is fed, you can restart to feed I2S data

But I'm not sure if this method is effective. Your file data and I2S data are not contiguous, so BSS need to take some time to re-converge.