cyberdrk / OSeeOurSpellWizard

Optical Character Recognition using Open CV 3.0 with Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OCR

This project aims to spell check physical documents on the go. The spell checking Python script uses simple probability concepts to check if spellings are correct, or incorrect. If an erratum is detected, it checks if it requires 1 or 2 changes (insertion, deletion, swap, replacement, etc.) and suggest word(s) that require the least number of changes. The Optical Character Recognition utilises the k - Nearest Neighbour Algorithm and OpenCV to scan text from documents

Part 1: SpellCheck

Code samples and additional resources on how to make a decent and pretty fast spell check

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

What things you need to install the software

python 2.7.x 

Read up about

Installing

Clone or download the repository.

Open spellcheckernorvig.py in your favorite editor.

Everything is pretty much there. Go ahead and run it!

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Hats off to Peter Norvig's amazing tutorial here for taking the time out to blog.

Also, don't forget to check out his personal blog. You'll end up finding hidden gems there 😉

Additional Resources

Most of this has been taken from (again) Peter Norvig's spell - checker tutorial. I'm mentioning some of them here:

Statistical Natural Language Processiong in Python

Birkbeck Spelling Error Corpus: Computer Readable English Spelling errors

The code and data by Peter Norvig to Natural Language Data: Beautiful Data

Spell - checking by computers, survey article by Roger Mitton

Spelling - checker tutorial by the LingPipe project

The aspell project and oh! They have better test data here

In case you forgot, our test data is big.txt

Part 2: K Nearest Neighbour using Optical Character Recognition [stay tuned!]

About

Optical Character Recognition using Open CV 3.0 with Python

License:MIT License


Languages

Language:Python 100.0%