Unable to do SpeakerDiarization
Adarsh1999 opened this issue · comments
I have been trying the speaker diarisation available by AWS by reading only available material on the net which is: https://docs.aws.amazon.com/transcribe/latest/dg/how-diarization.html.
The docs are very unclear and not leading to anywhere or where to start from still I tried a code by experimenting but always gives an internal error. So is there any way to get speaker diarisation or any guide to follow?
from __future__ import print_function
import time
import boto3
import uuid
transcribe = boto3.client('transcribe')
job_name = str(uuid.uuid4())
job_uri = "https://atris-bucket.s3.us-east-2.amazonaws.com/16000.wav"
transcribe.start_transcription_job(
MediaSampleRateHertz=16000,
TranscriptionJobName=job_name,
LanguageCode='en-US',
MediaFormat='wav',
Media={
'MediaFileUri': job_uri
},
Settings={
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 3,
'ChannelIdentification': False,
'ShowAlternatives': False,
'VocabularyFilterName': 'string',
})
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
print("Not ready yet...")
time.sleep(5)
print(status)