jacks205 / Spell-Check

Spell Checker in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spell-Check

Spell Checker in Python

Use

Cloning and Running Program

git clone https://github.com/jacks205/Spell-Check.git
cd Spell-Check
make 0 or make 1

Removing .pyc files if needed

make realclean

Note: When using word generated mistakes, reoccuring words or letters may appear. Cause being that random numbers aren't always completely random when generated reoccuringly.

Algorithm

Spell Check program using algorithm originally summarized by Dr. Peter Norvig. src: How to Write a Spelling Corrector additional src: Google Algorithm Paper

The algorithm used has 3 parts:

  • The probability of the typed word being correctly typed by the user
  • The offset probability of the user typing word, x, but initially meant word, y
  • Iteration of all possible outputs, and choosing a word which has the best probability

My altered algorithm used is faster than O(n) because I shortened the list of possible words based on the first letter. By creating a dictionary ordered by letter, the run time of the program would range closer to O(1/26*n), where n is the number of words, and 1/26 stands for the alphabet. If n is a

Main Challenge

Write a program that reads a large list of English words (e.g. from /usr/share/dict/words on a unix system) into memory, and then reads words from stdin, and prints either the best spelling suggestion, or "NO SUGGESTION" if no suggestion can be found. The program should print ">" as a prompt before reading each word, and should loop until killed.

Your solution should be faster than O(n) per word checked, where n is the length of the dictionary. That is to say, you can't scan the dictionary every time you want to spellcheck a word.

For example:

> sheeeeep

sheep

> peepple

people

> sheeple

NO SUGGESTION

The class of spelling mistakes to be corrected is as follows:

  • Case (upper/lower) errors: "inSIDE" => "inside"
  • Repeated letters: "jjoobbb" => "job"
  • Incorrect vowels: "weke" => "wake" Any combination of the above types of error in a single word should be corrected (e.g. "CUNsperrICY" => "conspiracy").

If there are many possible corrections of an input word, your program can choose one in any way you like. It just has to be an English word that is a spelling correction of the input by the above rules.

Final step: Write a second program that generates words with spelling mistakes of the above form, starting with correctly spelled English words. Pipe its output into the first program and verify that there are no occurrences of "NO SUGGESTION" in the output.

Algorithm Source

Peter Norvig - How to Write a Spelling Corrector

About

Spell Checker in Python


Languages

Language:Python 95.1%Language:Shell 4.9%