Microsoft Entra Authentication for Speech Service
cecheta opened this issue · comments
I have been trying to configure Microsoft Entra authentication for the Speech Service, as explained in these docs, yet it only seems to work from the browser.
My setup is Speech Service with public access disabled, and a private endpoint from a VNet. I have then connected to this VNet. The custom domain for the speech service has the same name as the resource itself.
I have run this sample, which uses the custom domain and Microsoft Entra for authentication, and this works well.
I've then tried to copy the same form of authentication in a Node example, but using speech-to-text on a file, but it doesn't work. I get no output.
const fs = require("fs");
const { DefaultAzureCredential } = require("@azure/identity");
const sdk = require("microsoft-cognitiveservices-speech-sdk");
(async () => {
const SUBSCRIPTION_ID = ""
const RESOURCE_GROUP = ""
const SPEECH_SERVICE_NAME = ""
const SPEECH_SERVICE_KEY = ""
const AUDIO_FILE_NAME = "audio1.wav"
const token = (await new DefaultAzureCredential().getToken("https://cognitiveservices.azure.com/.default")).token;
const resourceId = `/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.CognitiveServices/accounts/${SPEECH_SERVICE_NAME}`;
const speechToken = `aad#${resourceId}#${token}`;
const speechConfig = sdk.SpeechConfig.fromEndpoint(new URL(`wss://${SPEECH_SERVICE_NAME}.cognitiveservices.azure.com/stt/speech/universal/v2`));
speechConfig.authorizationToken = speechToken;
speechConfig.speechRecognitionLanguage = "en-US";
const pushStream = sdk.AudioInputStream.createPushStream();
fs.createReadStream(AUDIO_FILE_NAME).on('data', function (arrayBuffer) {
pushStream.write(arrayBuffer.buffer);
}).on('end', function () {
pushStream.close();
});
const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
recognizer.recognized = function (s, e) {
console.log(`Recognised: ${e.result.text}`);
};
recognizer.recognizeOnceAsync(() => {
recognizer.close();
});
})();
If I change
const speechConfig = sdk.SpeechConfig.fromEndpoint(new URL(`wss://${SPEECH_SERVICE_NAME}.cognitiveservices.azure.com/stt/speech/universal/v2`));
speechConfig.authorizationToken = speechToken;
to
const speechConfig = sdk.SpeechConfig.fromEndpoint(new URL(`wss://${SPEECH_SERVICE_NAME}.cognitiveservices.azure.com/stt/speech/universal/v2`), SPEECH_SERVICE_KEY);
Then it works, and I get the transcription.
I also tried in Python:
import azure.cognitiveservices.speech as speechsdk
from azure.identity import DefaultAzureCredential
SUBSCRIPTION_ID = ""
RESOURCE_GROUP = ""
SPEECH_SERVICE_NAME = ""
AUDIO_FILE_NAME = "audio1.wav"
token = DefaultAzureCredential().get_token("https://cognitiveservices.azure.com/.default").token
resource_id = f"/subscriptions/{SUBSCRIPTION_ID}/resourceGroups/{RESOURCE_GROUP}/providers/Microsoft.CognitiveServices/accounts/{SPEECH_SERVICE_NAME}"
speech_token = f"aad#{resource_id}#{token}"
speech_config = speechsdk.SpeechConfig(
endpoint=f"wss://{SPEECH_SERVICE_NAME}.cognitiveservices.azure.com/stt/speech/universal/v2",
)
speech_config.authorization_token = speech_token
speech_config.speech_recognition_language = "en-US"
audio_config = speechsdk.audio.AudioConfig(filename=AUDIO_FILE_NAME)
recognizer = speechsdk.SpeechRecognizer(speech_config, audio_config)
result = recognizer.recognize_once_async().get()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print(f"Recognised: {result.text}")
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print(f"Cancelled: {cancellation_details.reason}")
print(f"Error: {cancellation_details.error_details}")
Which gives the following output:
Cancelled: CancellationReason.Error
Error: WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. SessionId: 178b631d2deb4f10bf7d69de0dc7de1c
Why does it work from the browser but not when using Node/Python? Is it documented where it does and doesn't work?
To add to this, I've done some more testing, particularly with enabling and disabling the disableLocalAuth
property on the speech service resource, and I'm getting mixed results.
For example, it works initially, I then set disableLocalAuth: true
, it stops working, I set disableLocalAuth: false
, and it's still not working
Also, it seems like if you just use the Entra token as authorisation, instead of f"aad#{resource_id}#{token}"
, then it works?
Closing, as getting mixed and inconsistent results