MunchProductionz / Speech-to-Text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speech-to-Text

A project to learn speech-to-voice APIs and automatic report making using:


Virtual environment

Y though?

Python is bad at managing dependencies, especially when everything is run at a global level. We use virtual environments to get around this

Virtual environment setup

Virtual environment created according to this guide

  1. Setup a virtual environment:

    • python -m venv venv
  2. Activate it:

    • source venv/bin/activate
    • if successful you terminal should look like this: (venv) $
  3. Install packages using python -m pip install -r requirements.txt

    • This should automatically install all relevant packages
  4. Run program

  5. Deactivate virtual environment with deactivate

Packages

If using pip, install the following packages:

  • whisper
  • openai
  • openai-whisper
  • ffmpeg

How to generate requirements.txt

  1. Initiate virtual environment according to previous section

  2. Run python -m pip freeze > requirements.txt

Further reading in package management

venv isn't deterministic and we may encounter errors in the future. This is an alternative is used:

To start a poetry shell, use:

  • poetry shell

To deactivate and exit the shell, use:

  • exit

To only deactivate the virtual environsment without leaving the shell, use:

  • deactivate

To run a single script with poetry, use:

  • poetry run python you_script.py

Whisper AI

Usage

Whisper AI has multiple models, having a trade-off between speed and quality. A good balance can be found using the Medium model.

Installing Whisper AI

Follow this guide to install Whisper AI:

  • https://pypi.org/project/openai-whisper/

Proceed to start the virtual environment and add the whisper package. When using pip, write:

  • pip install -U openai-whisper

Make sure you have ffmpeg installed on your computer, if not, download the latest version of ffmpeg (use the first link) and follow the guides (second and third link) to add the `ffmpeg' binary to your PATH environment variable:

  • https://ffmpeg.org/download.html
  • https://www.youtube.com/watch?v=5xgegeBL0kw
  • https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/

Proceed to install ffmpeg as a Python package using:

  • pip install ffmpeg

When using Windows, ensure that you have Chocolatey installed. If not, follow this guide:

  • https://chocolatey.org/install

Large Files

There are 2 files that are too big for GitHub (above 100 MB), and we therefore need to use Git LFS. Start by following this guide:

  • https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage

When using a virtual environment, 2 files are too big for GitHub. Get around this by discarding the following changes before committing:

  • dnnl.lib
  • torch_cpu.dll

Follow this guide to use Git LFS:

  • https://www.youtube.com/watch?v=9HCsSD5PMSk

Use Git to open the repository and use:

  1. git lfs track "FILE.NAME
  2. git lfs push --all origin main
  3. git add .
  4. git commit -m "COMMIT MESSAGE"
  5. git push -u origin master

Updating Whisper AI

To update the package to the latest version of this repository, please run:

  • pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

Testing

Test files

The repository contains a test file named test.m4a. The actual text can be found on the follow websit under the title "Why tunnels?":

  • https://www.boringcompany.com/

Useful development plugins

About


Languages

Language:Python 80.1%Language:C++ 17.5%Language:C 1.2%Language:Cuda 0.6%Language:XSLT 0.3%Language:CMake 0.2%Language:Cython 0.1%Language:JavaScript 0.0%Language:Fortran 0.0%Language:PowerShell 0.0%Language:Roff 0.0%Language:ANTLR 0.0%Language:Perl 0.0%Language:CSS 0.0%Language:Forth 0.0%Language:Batchfile 0.0%Language:Shell 0.0%Language:HTML 0.0%