Simply invoke whisper and GPT to transcribe an audio file into text, then summarize into text
pip install git+https://github.com/yztxwd/voice_to_speech.git
You need an upgraded OpenAI API account for using this repo
Go to settings for your
Organization ID
, and generate an API key atAPI keys
(remember to COPY and SAVE IT!)
Tiny whisper model on example audio file
voice_to_speech -m tiny -a $OPENAI_API_KEY -o $ORGANIZATION_ID data/audio.mp3
with MPS accelerator (M-chip):
voice_to_speech -m tiny -a $OPENAI_API_KEY -o $ORGANIZATION_ID --device mps data/audio.mp3
with CUDA (Nvidia):
voice_to_speech -m tiny -a $OPENAI_API_KEY -o $ORGANIZATION_ID --device cuda:0 data/audio.mp3
if you only have video file, use ffmpeg to extract audio, for example:
# re-encoding depends on the audio format in video
ffmpeg -i video.mp4 -map 0:a -acodec libmp3lame audio.mp3