jtangney / live-captioning

Uses the Google Cloud Speech API to transcribe an audio stream. Deploys a set of highly available services to GKE. Work in progress.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

live-captioning

Uses the Google Cloud Speech-to-Text API to transcribe a live audio stream. App is deployed as a set of highly available services to GKE. Uses a leader-election pattern to maintain a stateful connection to the API. Work in progress.

Architecture

Not all elements of this diagram are currently implemented. alt text

Description

  • The core of the app is a regional GKE cluster, deployed over two zones
  • Three independent deployments within the cluster
    • Ingest: receives the source audio and stores in Cloud Memorystore for Redis.
      • Deployed as a Service, with a load balancer. This provides a single ingest IP address
    • Transcribe: reads the stored audio, performs streaming recognize requests against the Cloud Speech API. Transcription results are stored in Redis. More details below.
    • Outputs: consume the transcription results and perform business logic. There might be several different flavours of Output components e.g.
      • Stream the transcriptions to a web app for review/edit
      • Persist the transcriptions for compliance/analysis
  • Each deployment is replicated across the two zones for increased availability
  • Cloud Memorystore for Redis is used for intermediate storage. This provides low-latency, in-memory storage suitable for real time activities. Can be deployed in a highly-available configuration (replicated over two zones)

Transcribe details

  • Performs streaming recognize requests against the Cloud Speech API
  • This opens a bi-directional gRPC stream to the API; audio is sent to the API, and transcription results are asynchronously received. The Speech API client libraries abstract away the gRPC logic.
  • The connection to the API is stateful; transcription results can evolve as more audio is received by the API.
  • While the Transcribe deployment is replicated across zones, only a single pod (the leader) communicates with the API at a given point
    • This is achieved using a leader election pattern.
    • The Kubernetes Go client provides some built-in logic to do leader election
    • Simply speaking, the deployed Transcribe pods compete to acquire a lock that is managed by the Kubernetes control plane. One pod is elected as leader for a defined period of time. The leader continually “heartbeats” to renew its position as the leader, and the other pods periodically make new attempts to become the leader. This ensures that a new leader is identified quickly if the current leader fails.
    • The leader election is performed by a sidecar container, so the leader election logic is kept separate from the core transcription logic

Deploy & test the app

Rather than working in Cloud Shell, these instructions assume you are working on your local machine. This is because there are some hassles to get audio pacakges to work with Cloud Shell. You still create infrastructure in GCP, but execute the commands to do so locally. These instructions are for Mac. You'll need to adapt as appropriate (e.g. the sed command)

Setup, create infrastructure

  • Create a new GCP Project, and set it as the default. You'll need to give it a unique name. Supply your Organization ID if you have one.
    • gcloud projects create %your-project-id --set-as-default [--organization=%your-org-id]
  • Export the Project ID as a shell variable
    • export PROJECT_ID=$(gcloud config get-value project)
  • Enable billing; this is required.
    • gcloud beta billing projects link $PROJECT_ID --billing-account=%your-billing-account
  • Enable the relevant APIs. This can take a few mins.
    • gcloud services enable speech.googleapis.com container.googleapis.com redis.googleapis.com cloudbuild.googleapis.com
  • Create a GKE cluster. Changes the zones to your preference. This can take a few minutes
    • gcloud container clusters create captioning-cluster --cluster-version=1.14.7 --region=europe-west1 --scopes=gke-default,cloud-platform --machine-type=n1-highcpu-2 --num-nodes=1 --node-locations=europe-west1-b,europe-west1-d --enable-ip-alias
  • Get the cluster credentials
    • gcloud container clusters get-credentials captioning-cluster --region=europe-west1
  • Create a Cloud Memorystore instance. This can take a few mins
    • gcloud redis instances create redis-captions --tier=standard --region=europe-west1 --zone=europe-west1-b
  • Export the IP address of the Memorystore instance
    • export REDIS_HOST=`gcloud redis instances describe redis-captions --region=europe-west1 | sed -n 's/host: //p'`

Deploy the app

  • Clone this repo
    • git clone https://github.com/jtangney/live-captioning.git
  • Change directory
    • cd live-captioning
  • Build the Docker containers. They will be exported to your project's container registry
    • gcloud builds submit --config cloudbuild.yaml
  • Edit the yaml files to set your project ID
    • sed -i '' "s/mynewproject/$PROJECT_ID/" k8s/*.yaml
  • Edit the yaml files to set the IP address of the Cloud Memorystore instance
    • sed -i '' "s/redisHost=.*/redisHost=$REDIS_HOST/" k8s/*.yaml
  • Create the Deployments and Services in the cluster
    • kubectl apply -f k8s/ingestor.yaml,k8s/transcriber.yaml,k8s/editor.yaml
  • Verify that the 3 Deployments (ingestor, transciber, editor) have been created.
    • kubectl get deployments
  • Verify that 2 Services (ingestor, editor) have been created.
    • kubectl get services

Test

  • Get the external IP of the Ingest service
    • export INGEST_IP=`kubectl get services ingestor-service -o yaml | sed -n "s/- ip: //p"`
  • Verify that the Ingest service is up and running. You should see a 'hello' message
    • curl $INGEST_IP; echo
  • Get the external IP of the Editor service
    • kubectl get services editor-service -o yaml | sed -n "s/- ip: //p"
  • Open a browser window to the IP address above. Transcriptions will be written to this web page.
  • The test client is a Python script that uses PyAudio. PyAudio requires to install some system audio package. The command below assumes you are on a Mac, and use Homebrew.
    • brew install portaudio
  • Setup the client. This assumes you have python 3.7 and virtualenv installed.
    • cd client
    • virtualenv venv
    • source venv/bin/activate
    • pip install -r requirements.txt
  • Play local audio file and send audio to Ingest IP. The file will continuously play.
    • python socketio_client.py --targetip=$INGEST_IP --file=pager-article-snippet.wav
  • Switch to the browser window that has the Editor app; you should see transcriptions coming through

Experiment

  • You can change the played audio file via the --file option. Refer to the Speech API best practices about supported audio. The k8s/transcriber.yaml file defines the expected audio configuration; if you need to change the config, update the yaml and redeploy

About

Uses the Google Cloud Speech API to transcribe an audio stream. Deploys a set of highly available services to GKE. Work in progress.


Languages

Language:Go 59.2%Language:Python 28.6%Language:HTML 9.8%Language:Dockerfile 2.4%