The following creates a container that takes an audio file from S3, runs whisper model to transcribe it, and returns the transcription to S3.
It's intended to be used in AWS Batch service.
For this image to run it needs to be pushed to docker hub. For that you need to login first using:
docker login
Then you can push the image using:
docker push manuelsh/platic-whisper:latest
The container takes .flac
files from S3 with 16000 Hz sample rate.
The docker uses the following environment or config variables:
In the config.py
file:
WHISPER_MODEL
: whisper model type, set to large.S3_CREDENTIALS_PATH
: http path to S3 credentials from ec2 AIM role.S3_BUCKET
: S3 bucket where input and output files are stored.
Also, one must run the container with the following environment variables:
FILE_NAME
: name of the file to transcribe. The output file will have the same name with a_result.json
suffix.LANGUAGE
: the language of the audio file. To autodetect the language, set toauto
.TASK
: the task to run. Set totranscribe
to transcribe the audio file ortranslate
to translate the transcription to English.
The following command builds the container locally, as an example:
docker build -t whisper .
and as an example to run it:
docker run \
-e FILE_NAME=b3l01m3ahMQ9pR7DP9DSU5Ukba33_1960547e-d418-40c1-961b-805037a1645e.flac \
-e LANGUAGE=es \
-e TASK=transcribe \
--entrypoint python3 -u main.py\
-it whisper