ekaj2 / multimodal-gpt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Screenshot-based Multimodal GPT Assistant

  1. Python sounddevice for recording audio until you stop speaking
  2. Whisper API for transcribing audio
  3. OpenAI TTS for speech
  4. PyWinCtl and pyautogui for screenshots of a specific window
  5. OpenAI Vision API to process the screenshot and answer your prompt

Installation

python -m venv venv
. venv/bin/activate
pip install -r requirements.txt

Run

python main.py

Configuration

All project-wide settings are in settings.py.

About

License:Other


Languages

Language:Python 100.0%