PabloLION / whisper-note

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Key Features

  • Live Speech Recognition and Transcription: Automatically transcribe spoken words in real time (cannot be turned off).
  • Detailed Live Transcript: View a live transcript of your conversation, including translations, and see the queue length.
  • Select Input Language: Choose your preferred input language.
  • Optional Real-Time Translation: Get instant translations of spoken content if needed.
  • Choose Model Size: Select the model size that suits your needs.
  • Language-Specific Models: Utilize models tailored for specific languages.
  • Export Trimmed Recordings: Easily save trimmed .wav files of your recordings without silent gaps.
  • Export Transcript History: Save your entire transcript history as an .html file.
  • High-Quality Transcription: Optionally receive high-quality transcription after the recording is completed.
  • Upcoming Feature: Stay tuned for an optional summary of the transcript generated with the help of ChatGPT.

Install

Mac with Apple Silicon

brew install portaudio # src/pyaudio/device_api.c:9:10: fatal error: 'portaudio.h' file not found
brew install ffmpeg
brew install mbedtls # /opt/homebrew/Cellar/mbedtls/3.4.1/lib/libmbedcrypto.13.dylib

After this, my mbedtls@3.4.1 gives only a libmbedcrypto.14.dylib but I renamed it manually:

# for mbedtls@3.4.1
cp /opt/homebrew/Cellar/mbedtls/3.4.1/lib/libmbedcrypto.14.dylib /opt/homebrew/Cellar/mbedtls/3.4.1/lib/libmbedcrypto.13.dylib
# for mbedtls@3.5.0
cp /opt/homebrew/Cellar/mbedtls/3.5.0/lib/libmbedcrypto.15.dylib /opt/homebrew/Cellar/mbedtls/3.5.0/lib/libmbedcrypto.13.dylib

Then setup the virtual environment and install the requirements with poetry:

poetry install

Mac with Intel

(mbedtls updated to 3.5.0, so the version number is different)

cp /usr/local/opt/mbedtls/lib/libmbedcrypto.15.dylib /usr/local/opt/mbedtls/lib/libmbedcrypto.13.dylib

I had an error dyld[49347]: Library not loaded: '@loader_path/../../../../Python.framework/Versions/3.11/Python' on poetry installation; solved by pip install poetry. The brew version won't work.

Not supported

I don't know how to install these.

  • Windows
  • Linux
  • Mac with Intel

Use

  • Large model will cause the script to run slow: the recognition happens slower than a constantly speaking person, with M1 Ultra 128GB RAM.
  • Recommend to use small model: It's faster and the recognition is not bad.
  • See the comment in config.yml for more details.

Dev

Env setup

  • Suppose you have poetry installed on your machine with python@3.11.
  • Assume pwd is the root of this repo.
poetry install --with dev
poetry run pre-commit install
touch .env
  • Get an API key from DeepL and put it in .env in the root of this repo, like this:

    DEEPL_API_KEY=1234567890
    
  • Maybe setup your own config file. #TODO: default config file

  • #TODO: use make to make this easier

Memo

Most of this should be converted to GitHub Issues when published.

  • Trying to use result to handle error, not sure how it feels.
  • The idea is to build something to substitute Otter to take notes.
    • Check and try speech recognition package
  • Features:
    • summary of the text with ChatGPT
    • generate .SRT substitute
    • Add DEV_MODE env variable, in which mode logger should be more verbose
  • UI:
    • start/end control
    • Not needed: Add a "Still Recording..." indicator every 5 seconds the input is idle.
  • For translation, it seems that DeepL is the best option, but it's not free. Given I don't need it, just doing the most basic thing: translate the text with some API.
  • I tried to use textual but the CSS is not applied on the dynamically rendered list items. And it's not easy to use. Maybe use electron / eel instead if we want a web UI.

Special thanks

About

License:GNU General Public License v2.0


Languages

Language:Python 100.0%