flumi3 / speech-to-text

Transcribe audio files with Azure Cognitive Services

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speech To Text

This simple python project lets you convert the audio of a file into searchable text by using cloud computing resources from Azure Cognitive Services.

Requirements

  • Python 3
  • Instance of Azure Speech Service
  • Recommended audio format:
    • type: WAV (required)
    • precision: 16-bit
    • sample rate: 8kHz or 16kHz
    • channel: mono

Getting started

Setup the Azure Speech service

  1. Create free Azure Subscripition
  2. Create free instance of Speech service (5 audio hours per month)

Prepare the audio

The default audio format for the recognition to work is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). You can convert your audio with this Online Audio Converter.

Setup the environment

  1. Create virutal environment for installing the dependencies

    python3 -m venv venv
  2. Activate virtual environment

    # Linux
    source venv/bin/activate
    
    # Windows
    .\venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt

Provide your configuration

  1. Get API key and region of your Speech service resource
  2. Enter API key and location into env_sample.txt
  3. Enter input path, output path and language of your audio file into env_sample.txt
  4. Rename the file to .env

Run the transcription

python3 transcription.py

About

Transcribe audio files with Azure Cognitive Services


Languages

Language:Python 100.0%