Screenshot2Text

Main Functionality

Converting the image using the image link pasted into text
Converting the recent screenshotted image into text

Windows Setup

Creating an virtual environment using Python >= 3.8 py 3.8 -m venv env
Activate the env env\Scripts\activate.bat
Installing the Python libraries pip install -r requirements.txt
Download unofficial Tesseract for Windows: tesseract-ocr-w64-setup-v5.1.0.20220510.exe (64 bit)
Adding the path of Tesseract
For simplicity, you can create a shortcut to your desktop and run this script
1. For example:
- Target: C:\Windows\System32\cmd.exe /K "D:\Project\Python\2022\Screenshot2Text\env\Scripts\activate.bat" && ocr.py
- Start in: D:\Project\Python\2022\Screenshot2Text
For more languages, download at https://github.com/tesseract-ocr/tessdata and put those into the C:\Prorgam Files\Tesseract-OCR\tessdata

Linux Setup

Install Tesseract

sudo apt update
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt install -y tesseract-ocr
sudo apt update
tesseract -–version

Create Python virtual environment and activate it

python3.10 -m venv env
. env/bin/activate

Install dependencies

pip install -r requirements.txt

For more languages
- Method 1: Download at https://github.com/tesseract-ocr/tessdata, then copy the file to /usr/share/tesseract-ocr/<version>/tessdata
- Method 2: sudo apt-get install tesseract-ocr-[lang]
For simplicity of opening it, add a new shortcut with you own custom key gnome-terminal --window-with-profile=Mini -x bash -c 'cd /home/<username>/Desktop/personal/Screenshot2Text ; source <venv-name>/bin/activate ; python ocr.py ; deactivate'
- Optional: --window-with-profile=Mini
- Reference: https://askubuntu.com/questions/1072688/what-is-the-difference-between-the-e-and-x-options-for-gnome-terminal

Mac OS Setup

Install Tesseract

brew install tesseract
tesseract --list-langs

Create Python virtual environment and activate it

python3.12 -m venv env
. env/bin/activate

Install dependencies

pip3 install -r requirements.txt

To install the all languages

brew install tesseract --all-languages

OR copy the required files from this folder to /opt/homebrew/share/tessdata/ 5. Adding alias for executing the script from terminal since there is no keyboard shortcut like Linux where you can open the terminal explicitly and run the script

# Add this line into ~/.zshrc
alias ocr='cd ~/PATH/TO/Screenshot2Text ; env/bin/python3 ocr.py'

How to use?

Run the script
Enter to use the recent screenshot image to convert OR paste the image filepath
The output will be copied to your clipboard directly

Future work

URL images
Google drive link
Compatible for Linux
Compatible for Mac OS
PyQt a simple UI
Google Chrome and Firefox extension for extracting the text
Preprocess the image by inverting the dark image into bright image for better tesseract extraction.
Multimodel image description generation
LLM for answering simple question in PyQt

jwtanx / Screenshot2Text