ocr keyword-extraction image-to-text tesseract-ocr

OCR Bot 🤖

This action uses naptha/tesseract.js to extract text from images attached to issue comments.

The extracted text is appended to the issue body.

This allows extracted text to be searchable via Github's searchbox.

Usage

Create a workflow (eg: .github/workflows/ocr-bot.yml see Creating a Workflow file) with the following content:

name: "OCR Bot"
on:
  issues:
    types: [opened, edited]

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: thehanimo/ocr-bot@v1.0.0
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Done! You should see OCR keywords being added to issues that contain images. Something like this:

OCR Keywords

Mild Splendour of the various-vested Night! Mother of wildly-working visions! haill I watch thy gliding, while with watery light Thy weak eye glimmers through a fleecy veil; And when thou lovest thy pale orb to shroud Behind the gather’d blackness lost on high; And when thou dartest from the wind-rent cloud Thy placid lightning o’er the awaken’d sky.

Development

Install the dependencies

npm install

Run the tests ✔️

$ npm test

 PASS  ./index.test.js
  ✓ empty comment (3 ms)
  ✓ links outside img tag (1 ms)
  ✓ extract text (1 ms)
...

About

An action to automatically extract keywords from images in issue bodies, making them searchable 🔍

ocr keyword-extraction image-to-text tesseract-ocr

MIT License

Languages

Language:JavaScript 100.0%