Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pronunciation Assessment Result Not as Expected

JiyouShin opened this issue · comments

commented

I’m working on integrating Azure AI's Pronunciation Assessment API into my project. I’ve managed to capture audio from the user's microphone and send it to the API. However, the results I'm receiving don't seem to align with the expected output. I’d appreciate any insights or suggestions on potential issues in my implementation.

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorderRef.current = new MediaRecorder(stream);
recordedChunksRef.current = [];

mediaRecorderRef.current.ondataavailable = (event) => {
  if (event.data.size > 0) {
    recordedChunksRef.current.push(event.data);
  }
};

const pushStream = sdk.AudioInputStream.createPushStream();
// Combine all chunks into a single array buffer
const blob = new Blob(recordedChunksRef.current, { type: 'audio/wav' });
const arrayBuffer = await blob.arrayBuffer();
// Convert the array buffer to a Uint8Array
const uint8Array = new Uint8Array(arrayBuffer);
// Write the Uint8Array to the push stream
pushStream.write(uint8Array);
// Close the push stream
pushStream.close();

var audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);

The code provided is my process for �getting audio from the user and converting it into a format suitable for sdk.AudioConfig.

function onRecognizedResult(result) {
  console.log(result);
  console.log("Pronunciation assessment for: ", result.text);
  var pronunciation_result = sdk.PronunciationAssessmentResult.fromResult(result);
  console.log("Accuracy score: ", pronunciation_result.accuracyScore, '\n',
      "Pronunciation score: ", pronunciation_result.pronunciationScore, '\n',
      "Completeness score: ", pronunciation_result.completenessScore, '\n',
      "Fluency score: ", pronunciation_result.fluencyScore, '\n',
      "Prosody score: ", pronunciation_result.prosodyScore
  );
  console.log("Word-level details:");
  _.forEach(pronunciation_result.detailResult.Words, (word, idx) => {
      console.log("    ", idx + 1, ": word: ", word.Word, "\taccuracy score: ", word.PronunciationAssessment.AccuracyScore, "\terror type: ", word.PronunciationAssessment.ErrorType, ";");
  });
  reco.close();
}

reco.recognizeOnceAsync(
  function (successfulResult) {
    onRecognizedResult(successfulResult);
  }
)

I implemented the result retrieval section as shown above, based on the sample JavaScript code provided.
However, I got result like this.
스크린샷 2024-09-01 오후 5 27 23

Could you please help me identify if there are any issues with my implementation or if there are additional configurations required for accurate results?

Thank you!

@wangkenpu could you check?

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.