imgta / vialect

Streamline your video/audio intake by transforming multimedia content into navigable collections of transcribed text and summaries!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ‘Ύ V/ALect

ViaLect streamlines your media intake by transforming audio into workable text and generated summaries!

$\textcolor{gray}{\textit{Toward words unheard,}}$
$\textcolor{gray}{\textit{traverse digital oceans}}$
$\textcolor{gray}{\textit{for treasures unseen.}}$

Features:

Tip

πŸ“‘ Audio Extraction β†’ Pull from various media platforms or uploads
πŸ›Έ ASR & Diarization β†’ Identify and align speakers to timestamped dialogues
🌎 Translation β†’ Detect languages and translate to English
πŸ€– Speech-to-Text β†’ Accurately transcribe text from extracted audio
πŸ’¬ Summarization β†’ Focus on key concepts with transcript-based summaries
πŸ”Š Text-to-Speech β†’ Have your generated summaries read back to you
πŸ“š Media Collection β†’ Locally store and navigate your transformed data
πŸš€ Intuitive UI β†’ Seamless frontend layout via Streamlit

vialect_0

more

vialect_medias

Setup:

Important

Key Packages: OpenAI Whisper, PyTorch (CUDA v11.8), pyannote.audio, yt-dlp, Streamlit

1. Git clone this repository:

git clone https://github.com/imgta/vialect.git

2. Install ffmpeg and requirements:

sudo apt install ffmpeg
pip install -r requirements.txt

3. Obtain Hugging Face token/access, obtain OpenAI API Key
4. Create and update .streamlit/secrets.toml' (Optional: input keys in Secret Keys Drawer after launch)
5. Launch streamlit app:

streamlit run app/Home.py

Usage:

Note

1. Select whisper model and options
2. Input or upload video/audio file
3. Submit for transcription

Roadmap:

  • Generate summary based on transcript text
  • Create and store text embeddings in vector DB for RAG querying
  • ASR/Diarization for timestamps => reduced sliding window
  • Partition overlap + translation stress test (Anime subbing?)
  • Realtime ASR

About

Streamline your video/audio intake by transforming multimedia content into navigable collections of transcribed text and summaries!


Languages

Language:Python 80.5%Language:CSS 19.5%