aravind-gk / Spell-Corrector

A spell correction python program which I wrote in 30 minutes as part of a hands-on coding round at a startup.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spell-Corrector

A basic spell correction python program which I wrote in 30 minutes as part of a hands-on coding round at a startup.

The algorithm is as follows:

  • Given both the clean and unclean (containing mis-spelled words) sentences, create a vocabulary of known & correct words -- set of all words from the clean sentences
  • For all words in vocabulary, create and store their count-vectors, of size (26x1).
  • For each unclean sentence, split them into individual words and for each word, do the following -
    • If that word is present in our vocabulary of clean words, it's already correct.
    • Else, create a count-vector embedding of size (26x1) for that word
    • Compare the cosine similarity of this word with all words in vocabulary to find the closest match. Then output this highest-similarity match as the clean, corrected spelling for given mis-spelled word.

This very basic and simple algorithm works surprisingly well for most practical situations, but also has some obvious drawbacks -

  • Using this method, we cannot differentiate between words having different spellings but identical count-vectors.
  • The exact position and semantics cannot be captured using count-vectors.
  • This model will struggle with words which have different spellings in different regions, like color and colour.

About

A spell correction python program which I wrote in 30 minutes as part of a hands-on coding round at a startup.


Languages

Language:Jupyter Notebook 100.0%