cfortuner / jarvis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

jarvis

Potential use cases

  • quick switch to application or a tab
  • layout application windows in a grid
  • automate interactions with applications (share)
  • start/stop recording
  • take screenshot
  • quick note taking
  • copy to different clipboard buffers "copy this as blah", "paste from blah"
  • set reminders
  • notify

We try to follow the Google Python Style Guide.

Roadmap

See the Brainstorm doc for the "crazy ideas."

Features / Enhancements

  • Keyboard shortcut to open App and click record
  • "Listening" animation after clicking record
  • Dedicated console on GUI for debug logs (currently logs are truncated)
  • Record from history (save a sequence of voice commands as a macro)

Bugs / Known Issues

  • If you forget to call "exit" in stream mode and let the microphone run for awhile before your next command, google will continue to record audio and try to process an extremely large transcript which causes the program to timeout / drag. We need a way to detect silence in stream mode, and clear the audio buffer. We can use a timeout parameter in the Microphone or GoogleTranscriber to clear the buffer if no commands are heard for N seconds
  • [Mac] The TaskBar loads slowly and gets stuck sometimes
  • [Mac] "Switch to X" gets stuck if program is minimized ()
  • [Mac] GUI layout isn't formatted properly. Appears to be differences between Monitors or Operating Systems we need to work out. Ideally the GUI can appear the same across all monitors/OS.

UI Features/Bugs

  • Use real-time sound detection from mic to play animation (don't wait for google)?
  • Show/Hide GUI using the Python API (make keyboard shortcut)
  • Send Show/Hide event to Python when the user opens the window
  • UI should always be on top of all windows (pin the window)?

Developer Setup

Mac Setup

(Tested on MacOS Big Sur 11.4, M1 Chip, Intel Chip)

  1. Install Pyenv and Python 3.8.10
brew install pyenv
pyenv install 3.8.10
pyenv global 3.8.10

# Run this and follow instructions for how to update your PATH, ~/.profile, ~/.zprofile, and ~/.zshrc. Then do a full logout and log back in.
pyenv init

# Verify pyenv is working
>> python -V 
Python 3.8.10
  1. Install homebrew prerequisites
# Microphone support
brew install portaudio

# Sphinx NLP library (Optional, also requires python 3.6)
# https://pypi.org/project/pocketsphinx/
# https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst
brew install swig

# For AppKit
brew install cairo gobject-introspection

# For Kivy
# https://kivy.org/doc/stable/installation/installation-osx.html#install-source-os
brew install pkg-config sdl2 sdl2_image sdl2_ttf sdl2_mixer gstreamer

brew install openssl
  1. Update environment variables to properly configure clang

Either add these to ~/.profile or manually run them in the shell before running pip install -r requirements

export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1
export CPLUS_INCLUDE_PATH="${CPLUS_INCLUDE_PATH:+${CPLUS_INCLUDE_PATH}:}/opt/homebrew/opt/openssl/include"
  1. Install Chrome Web driver (for browser automation)

Instructions here. On MacOS you also have to grant permissions to web driver. Download the same version as your version of Chrome.

  1. Set up Google Cloud Project

First, create a GCP project or use the Jarvis one (jarvis-1626279785926). If you create one, you'll need to set up a billing account and enable the Cloud Speech APIs.

Next, install the SDK https://cloud.google.com/sdk/docs/install and configure it.

gcloud init
gcloud config list

# Should see something like
[core]
account = bfortuner@gmail.com
disable_usage_reporting = True
project = jarvis-1626279785926

# Login to get credentials
gcloud auth application-default login  

Ubuntu Setup

(Tested on Ubuntu 20.04)

  1. Install Python Virtual Environment
sudo apt install python3-venv
  1. Install library dependencies
sudo apt install python3.8-dev

# Kivy depends on this
sudo apt install python3-tk
sudo apt install libcairo2-dev

# SpeechRecognition package depends on these
sudo apt install libportaudio2 portaudio19-dev

# PyGObject depends on this
sudo apt install libgirepository1.0-dev

# Taskbar icon support requires this
sudo apt install gir1.2-appindicator3-0.1

# If running without a GUI and pyautogui gives you KEYERROR :DISPLAY. Add this to ~/.bashrc, etc.
export DISPLAY=:0

Python Setup

  1. Create Virtualenv (Python 3.8)
pip3 install virtualenv
virtualenv .venv --python=python3
source .venv/bin/activate
  1. Install python dependencies
# export ARCHFLAGS="-arch x86_64"  # for pyaudio on older versions of MacOS (not required on Big Sur)
pip install -r requirements.txt
  1. Install Kivy (Mac Only)
# The M1 architecture requires we install Kivy from source
# https://kivy.org/doc/stable/gettingstarted/installation.html#from-source
git clone git://github.com/kivy/kivy.git kivy_repo && cd kivy_repo
python -m pip install -e ".[base]"  && cd ..
  1. Install atomac (Mac Only)

Atomac seems to have a dependency because of which we can't install directly using pip install so we need to get the source code.

git clone https://github.com/pyatom/pyatom.git pyatom_repo && cd pyatom_repo
python -m pip install future
python -m pip install . && cd ..
  1. Verify things are working
# Say something
python scratch/speech_recognition_examples.py

# Verify GCP auth is working
python scratch/google_speech_recognition_example.py

# A window with "Hello world" should open
python scratch/kivy_example.py

# Verify Selenium is installed correctly
python -m scratch.selenium_example

# Run the Kivy app (then click Record and "Switch to Chrome")
python main.py

# OR, run the electron app
python electron.py

# And in prophet...
npm install
npm run start
  1. App steps
# Load contacts from google
python -m higgins.automation.contacts.google

# Install pytorch
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Testing

Run unit tests

# Optionally prepend SPEED_LIMIT=N_SEC to slow down the automation when debugging
pytest tests/

Distributing the app

Some options for packaging the app into a native executable:

Links

Documentation

For creating menu bars on MacOS

Audio Libraries

Voice Programming

Speech Recognition

Microphone/audio recording example

Audio ML

Datasets

Python (background tasks and queues)

NLP

Code Generation

NLP-to-Bash

Desktop Launcher Tools (Mac Spotlight++)

Headless Browsers

Personal Assistant

Semantic Search

GPT3 Notes

Ideas to improve truthfulness

  • Generator samples 10 answers, discriminator evaluates the answers and selects the best
  • Fine-tune the model on your facts
  • Lower the temperature
  • Include "I don't know" as a valid response, with examples (false positives)
  • Incorporate the model's confidence (log probs?) to evaluate the reply (and determine how many times to sample?)

Ideas to process large documents

Website for GPT-based projects http://gptcrush.com/ Email-related product from GPT https://www.hypertype.co/

  • NOTE: You pay money for every document searched
  • Pre-search the data with a cheap model (Ada) or non-model-based search engine (Gmail API, ElasticSearch, txt AI)
  • Break large documents into snippets
  • Pre-process the document into summarizations or salient facts
  • Run semantic search, then completion (like answers/ endpoint)
  • I have 6M documents and fast search with SOLR (ElasticSearch)
  • Steps
    • Search relevant documents with cheap local engine (elasticsearch, SOLR)
    • Upload chunks of these articles dynamically based on the most relevant chunk from that article
    • Pass results to semantic search or answers endpoint

Semantic Search / Email Processing

Data Labeling

Large Document IR and Summarization

About


Languages

Language:Python 80.7%Language:HTML 19.1%Language:Shell 0.2%