jeturgavli / ImageToText

Screenshot Image to Text Data Extract into excel File

image-to-text tesseract-ocr

Text Extraction and Excel Parsing from Images

This Python script extracts text from images using Tesseract OCR and organizes it into an Excel file.

Features

Automated Installation: Checks for required Python modules (pytesseract, openpyxl, pandas) and installs them if missing.
Text Extraction: Utilizes Tesseract OCR to extract text from images.
Data Parsing: Parses extracted text to extract contact names and times seen, organizing them into an Excel file.
Logging: Logs informative messages, warnings, and errors for better tracking and debugging.
User Interaction: Prompts the user for image and output folder paths, allowing for interactive usage.

Usage

Ensure Python is installed.
Install Tesseract OCR:
- Windows:
  - Download the installer from https://github.com/UB-Mannheim/tesseract/wiki.
  - Run the installer and follow the installation instructions.
  - Add the Tesseract installation directory to the system's PATH environment variable.
  - click here to watch how install Tesseract Ocr for windows
- Linux:
  - Use your package manager to install Tesseract OCR. For example, on Ubuntu:
```
sudo apt-get update
sudo apt-get install tesseract-ocr
```
- macOS:
  - Install Tesseract OCR using Homebrew:
```
brew install tesseract
```
Clone or download the repository.
Place images to be processed in the images folder.
Run the script (main.py).
Follow the prompts to input image and output folder paths.
View the generated Excel files in the output folder.

Dependencies

Python 3.x
Tesseract OCR
Required Python modules: pytesseract, openpyxl, pandas

Author

Contribution

License

This project is licensed under the MIT License.

About

Screenshot Image to Text Data Extract into excel File

image-to-text tesseract-ocr

MIT License

Languages

Language:Python 100.0%