This project aims to spell check physical documents on the go. The spell checking Python script uses simple probability concepts to check if spellings are correct, or incorrect. If an erratum is detected, it checks if it requires 1 or 2 changes (insertion, deletion, swap, replacement, etc.) and suggest word(s) that require the least number of changes. The Optical Character Recognition utilises the k - Nearest Neighbour Algorithm and OpenCV to scan text from documents
Code samples and additional resources on how to make a decent and pretty fast spell check
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software
python 2.7.x
- Regular Expressions RegEx http://www.pitt.edu/~naraehan/python2/re.html
- Counter objects from collections https://docs.python.org/2/library/collections.html
Clone or download the repository.
Open spellcheckernorvig.py in your favorite editor.
Everything is pretty much there. Go ahead and run it!
This project is licensed under the MIT License - see the LICENSE.md file for details
Hats off to Peter Norvig's amazing tutorial here for taking the time out to blog.
Also, don't forget to check out his personal blog. You'll end up finding hidden gems there 😉
Most of this has been taken from (again) Peter Norvig's spell - checker tutorial. I'm mentioning some of them here:
Statistical Natural Language Processiong in Python
Birkbeck Spelling Error Corpus: Computer Readable English Spelling errors
The code and data by Peter Norvig to Natural Language Data: Beautiful Data
Spell - checking by computers, survey article by Roger Mitton
Spelling - checker tutorial by the LingPipe project
The aspell project and oh! They have better test data here
In case you forgot, our test data is big.txt