Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Microsoft Entra Authentication for Speech Service

cecheta opened this issue · comments

I have been trying to configure Microsoft Entra authentication for the Speech Service, as explained in these docs, yet it only seems to work from the browser.

My setup is Speech Service with public access disabled, and a private endpoint from a VNet. I have then connected to this VNet. The custom domain for the speech service has the same name as the resource itself.

I have run this sample, which uses the custom domain and Microsoft Entra for authentication, and this works well.

I've then tried to copy the same form of authentication in a Node example, but using speech-to-text on a file, but it doesn't work. I get no output.

const fs = require("fs");
const { DefaultAzureCredential } = require("@azure/identity");
const sdk = require("microsoft-cognitiveservices-speech-sdk");

(async () => {
  const SUBSCRIPTION_ID = ""
  const RESOURCE_GROUP = ""
  const AUDIO_FILE_NAME = "audio1.wav"

  const token = (await new DefaultAzureCredential().getToken("")).token;
  const resourceId = `/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.CognitiveServices/accounts/${SPEECH_SERVICE_NAME}`;
  const speechToken = `aad#${resourceId}#${token}`;

  const speechConfig = sdk.SpeechConfig.fromEndpoint(new URL(`wss://${SPEECH_SERVICE_NAME}`));
  speechConfig.authorizationToken = speechToken;

  speechConfig.speechRecognitionLanguage = "en-US";

  const pushStream = sdk.AudioInputStream.createPushStream();

  fs.createReadStream(AUDIO_FILE_NAME).on('data', function (arrayBuffer) {
  }).on('end', function () {

  const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);

  const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);

  recognizer.recognized = function (s, e) {
    console.log(`Recognised: ${e.result.text}`);

  recognizer.recognizeOnceAsync(() => {

If I change

const speechConfig = sdk.SpeechConfig.fromEndpoint(new URL(`wss://${SPEECH_SERVICE_NAME}`));
speechConfig.authorizationToken = speechToken;


const speechConfig = sdk.SpeechConfig.fromEndpoint(new URL(`wss://${SPEECH_SERVICE_NAME}`), SPEECH_SERVICE_KEY);

Then it works, and I get the transcription.

I also tried in Python:

import azure.cognitiveservices.speech as speechsdk
from azure.identity import DefaultAzureCredential

AUDIO_FILE_NAME = "audio1.wav"

token = DefaultAzureCredential().get_token("").token
resource_id = f"/subscriptions/{SUBSCRIPTION_ID}/resourceGroups/{RESOURCE_GROUP}/providers/Microsoft.CognitiveServices/accounts/{SPEECH_SERVICE_NAME}"
speech_token = f"aad#{resource_id}#{token}"

speech_config = speechsdk.SpeechConfig(
speech_config.authorization_token = speech_token
speech_config.speech_recognition_language = "en-US"

audio_config =

recognizer = speechsdk.SpeechRecognizer(speech_config, audio_config)

result = recognizer.recognize_once_async().get()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print(f"Recognised: {result.text}")
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print(f"Cancelled: {cancellation_details.reason}")
    print(f"Error: {cancellation_details.error_details}")

Which gives the following output:

Cancelled: CancellationReason.Error
Error: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. SessionId: 178b631d2deb4f10bf7d69de0dc7de1c

Why does it work from the browser but not when using Node/Python? Is it documented where it does and doesn't work?

To add to this, I've done some more testing, particularly with enabling and disabling the disableLocalAuth property on the speech service resource, and I'm getting mixed results.

For example, it works initially, I then set disableLocalAuth: true, it stops working, I set disableLocalAuth: false, and it's still not working

Also, it seems like if you just use the Entra token as authorisation, instead of f"aad#{resource_id}#{token}", then it works?

Closing, as getting mixed and inconsistent results