phitrann / automatic-speech-recognition

Automatic speech recognition, speaker diarizarion and grammar/pronunciation assessment

Repository from Github https://github.comphitrann/automatic-speech-recognitionRepository from Github https://github.comphitrann/automatic-speech-recognition

Automatic Speech Recognition

This project aims to build a system that can automatically transcribe speech to text. The system will be able to transcribe speech from various sources such as YouTube videos, audio files, etc. The system will be built using the NeMo toolkit, which is a toolkit for building state-of-the-art conversational AI models.

Supported functions:

  • Collect data from YouTube
  • Process data
  • Automatic Speech Recognition (ASR)
  • Speaker Diarization
  • Pronunciation/Grammar Assessment

Getting started

I recommend to use anaconda to create environment

    conda create -n asr python=3.10
    conda activate asr

Clone the repository

    git clone https://github.com/Foxxy-HCMUS/automatic-speech-recognition.git
    sudo apt-get install ffmpeg

    pip install git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560
    pip install dora-search lameenc openunmix wget Cython
    pip install --no-build-isolation "nemo_toolkit[asr]==1.23.0"
    pip install --no-deps git+https://github.com/facebookresearch/demucs#egg=demucs
    pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git
    pip install ctranslate2==3.24.0

Install requirements

    # Install in editable mode to avoid constant re-installation
    # Also include all optional dependencies
    python -m pip install -e .[all]

    # Install pre-commit hooks to automatically check/format code on commits
    pre-commit install

In case pytorch cannot compiled with cuda, please run the following command

    pip install torch==1.13.1+cu116 torchaudio==0.13.1 torchvision==0.14.1+cu116 --extra-index-url=https://download.pytorch.org/whl/cu116

Guidelines

Automatic Speech Recognition and Speaker Diarization

  • Please visit the notebook task_1.ipynb, run all cells to see the full pipeline for ASR and Speaker Diarization.

Pronunciation and Grammar Assessment

  • Currently, the system is in development and will be available soon. Code for this task is in the task_2.ipynb notebook.

CLI for Data Preparation

  • Collect
    python /src/asr/collect_data.py
  • Preprocess
    python /src/asr/parser.py
  • Clean up
    ./clean_up.sh

References

About

Automatic speech recognition, speaker diarizarion and grammar/pronunciation assessment


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%Language:Shell 0.0%