kumar-abhishek / fsdl-talk-to-transformer

See how a modern neural network completes your spoken words/sentences. Speak a few words to try this out.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Talk-To-Transformer

Introduction

An application that talks back to you when you seed it with some words. Under the hood, a transformer model is used for text generation.

This app does these 3 things in sequence:

  1. Converts speech to text using Wave2Vec model from Huggingface
  2. Text generation using DistilGPT-2, which is another model from Huggingface library
  3. Converts text to speech using the Tacotron model from Coqui library

What can this app do?

This app is inspired from the pre-existing web-app, which is a text only version. This app takes it to the next level where you can talk and listen to the generated text.

Setting up the project

1. Clone the repo

$ git clone https://github.com/kumar-abhishek/fsdl-talk-to-transformer
$ cd fsdl-talk-to-transformer

2. Installation

You can install the required packages using any one of the 3 options below. However, Option 1 is well tested and recommended. If you have to use other options, you would likely have to tweak the list of versions/packages in environment.yml from Option 1.

Please also note that since this has been developed and tested on a Mac OS, you may encounter some minor installation issues on other operating systems.

Option 1: Using Conda - Recommended

  • Download and install miniconda or anaconda if you don't have conda installed in your system.

  • Create a new environment 'tf' using the following command:

    # if you need to install conda on ubuntu/debian based OS
    eval "$(/root/miniconda3/bin/conda shell.bash hook)" 
    
    $ conda env create -f environment.yml
    
    # to update the environment, make sure have already activated the environment
    conda env update --file environment.yml

    If an error like the one shown below occurs:

    ResolvePackageNotFound:
        - appnope=0.1.0
        - libcxx=4.0.1

    Just remove those packages from the file environment.yml and rerun the above command. Then activate the environment by

    $ conda activate tf

Option 2: Using setup.sh

  • If you face trouble installing packages, you may install using setup.sh. If you use bash shell instead of zsh, edit line 2 of setup.sh by replacing zsh with bash. i.e.

    eval "$(conda shell.zsh hook)"

    Then run the following commands

    $ chmod +x setup.sh
    $ ./setup.sh

Option 3: Using Virtual Environment

  • Install virtualenv using pip and create a virtual environment '.venv'

    $ pip install virtualenv
    $ virtualenv .venv
  • Activate the virtual environment '.venv' and install required packages

    $ source .venv/bin/activate
    $ pip install -r requirements.txt

Configuring the App (settings.py)

  • First, check the info of the audio recording device of your system by running

    $ python -m src.sound

    You will receive output something like this:

    pygame 1.9.6
    Hello from the pygame community. https://www.pygame.org/contribute.html
    src.sound - INFO - List of System's Audio Devices configurations:
    src.sound - INFO - Number of audio devices: 2
    src.sound - INFO - [('index', 0), ('name', 'MacBook Pro Microphone'), ('maxInputChannels', 1), ('defaultSampleRate', 44100.0)]
    src.sound - INFO - [('index', 1), ('name', 'MacBook Pro Speakers'), ('maxInputChannels', 0), ('defaultSampleRate', 44100.0)]
    
    src.sound - INFO - Audio device configurations currently used
    src.sound - INFO - Default input device index = 0
    src.sound - INFO - Max input channels = 1
    src.sound - INFO - Default samplerate = 44100
  • Check if the index, maxInputChannels and defaultSampleRate of your recording device or microphone (eg. MacBook Pro Microphone) matches with the device configurations currently used (both displayed in the output). The configurations for my recording device is:

    index = 0
    maxInputChannels = 1
    defaultSampleRate = 44100.0
  • Open settings.py and modify the values accordingly in line numbers 38 to 40

    # Audio configurations
    INPUT_DEVICE = 0
    MAX_INPUT_CHANNELS = 1  # Max input channels
    DEFAULT_SAMPLE_RATE = 44100   # Default sample rate of microphone or recording device

Running the Talk to Transformer Webapp

  • Execute the python file app.py using streamlit

    $ streamlit run app.py
  • The webapp is launched in your browser and opened automatically as shown below. You may also open it by visiting http://localhost:8501

    Talk to Transformer App demo.
  • Click Record and say something. It records for 5 seconds(configurable) and saves the output wav file to recording/recorded.wav.

  • Click Play to listen to the recorded speech.

  • Click Play generated text to play the generated text

Training the model (Optional)

If you want to experiment by training the model used for text generation(in generate_text.py file) yourself with your own data or the data used currently, follow the steps below:

  • Open the file notebooks/train_text_generator.ipynb in Google Colab(link at the top of the file) and run all the cells.
  • You would have to save the trained model and use that model in generate_text.py instead of using the pre-trained model from huggingface library.

Report Issues

If you have any issues with the app, please report it here: Issues

License

Talk-To-Transformer is licensed under the GNU GPLv3 license.

credits: Heavily influenced from this repo

About

See how a modern neural network completes your spoken words/sentences. Speak a few words to try this out.

License:GNU General Public License v3.0


Languages

Language:Python 71.1%Language:Jupyter Notebook 28.0%Language:Shell 0.8%