The project is organized into 5 main directories: Models
, Results
, UI
,Notebooks
, and Data
.
Models
: Contains the models for each OCR method tested.Results
: Stores the output from each of the OCR methods.UI
: Houses the StreamlitUI.py
file that runs a web server.Notebooks
: Contains notebooks for each OCR method.Data
: Contains our custom compiled dataset of images of ingredient lists and backs of various packaged food products.
This project focuses on the extraction of text from images using various Optical Character Recognition (OCR) methods. The goal is to accurately and efficiently convert image-based text into machine-readable text. The OCR method extracts the text, and our web server component uses ChaTGPT to extract the relevant information (allergies and dietary restrictions) from the text.
We have explored and implemented several OCR methods in this project:
- GOCR: A traditional OCR approach that provides a baseline for our experiments.
- Tesseract: A popular OCR engine that uses deep learning techniques.
- ChatGPT Vision: A novel approach that directly converts images to text.
- EasyOCR Model: A model from the EasyOCR library that we have fine-tuned on our dataset.
- MMOCR Model: A highly sophisticated library, leveraging state-of-the-art models to optimize images for text detection and recognition tasks.
- Final Model: Our custom model that combines Tesseract and a data transformation pipeline. This model preprocesses the image, applies OCR using Tesseract, and then processes the OCR text with ChatGPT to extract the ingredient list.
Each of these methods has corresponding files in the Models
and Notebooks
folders.
Run make install
to install dependencies. This installs dependencies located in requirements.txt
. cd
into Models
to run each model.
We have developed a user interface using Streamlit. The UI.py
file in the UI
folder runs a web server that deploys our basic pipeline. Users can upload images and receive back an allergy list.
- python -m venv venv
- source venv/bin/activate
- pip install -r requirements.txt
- add OpenAI API_EKY to the
.env
file - streamlit run UI/UI.py
The Results
folder contains the output from each of the OCR methods tested. This allows us to compare the performance of each method and make informed decisions about which methods to use or further develop.
We have documented our testing, tabulation, reports, and comparisons between our different approaches on the Weights & Biases platform. You can view our project here: https://wandb.ai/aipi549/aipi540/overview?workspace=user-hongxuanli and the Results from all of the models on our dataset