Aavato-c / 20231124_ai_voice_camera_demo

A set of scripts to capture pictures from a webcam feed and then getting the description of them from openai. After that, we covert the text to speech using Elevenlabs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Elevenlabs webcam descriptor

The source is quite simple. Go ahead and check it out, I don't think it's necessary to explain everything here but in short:

  • This will save frames from your webcam and store them in the media -folder
    • The most current frame will be updated
  • The other app makes calls to openai to get a description of the image, then send a request to elevenlabs to make a dub.
    • The sound files will also be stored

To start using this little thing do the following:

  • Make a venv using python3.8
    • python3.8 venv -m venv
  • Activate the environment
    • source venv/bin/activate
  • Install dependencies
    • pip install -r requirements.txt
  • Make your own env
    • You can use the env.example provided, remove the .example extension and fill in your api keys
  • Run the save_video_frames.py
    • python3 save_video_frames.py
  • Run the main_app.py
    • python3 main_app.py
  • Enjoy

Made with a mac M1

About

A set of scripts to capture pictures from a webcam feed and then getting the description of them from openai. After that, we covert the text to speech using Elevenlabs.

License:MIT License


Languages

Language:Python 100.0%