Logahn / auto-correct-model

An implementation of a spell checker that uses a corpus file to compute word probabilities and suggests corrections for misspelled words by applying edit operations like delete, swap, replace, and insert.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spell Checker in Python

This is an implementation of a spell checker in Python. The spell checker reads in a corpus file (in this case, ./english.txt) and computes the probability of each word in the corpus. It then uses this information to suggest corrections for a given misspelled word.

Functions

The code contains the following functions:

  • read_corpus(filename): reads in a corpus file and returns a list of all words in the file.
  • split(word): returns a list of all possible ways to split a word into two parts.
  • delete(word): returns a list of all possible words that can be generated by deleting one character from the input word.
  • swap(word): returns a list of all possible words that can be generated by swapping adjacent characters in the input word.
  • replace(word): returns a list of all possible words that can be generated by replacing one character in the input word with a letter from the alphabet.
  • insert(word): returns a list of all possible words that can be generated by inserting one character from the alphabet into the input word.
  • edit1(word): returns a set of all possible words that can be generated by applying one edit operation (i.e. delete, swap, replace, or insert) to the input word.
  • edit2(word): returns a set of all possible words that can be generated by applying two edit operations to the input word.
  • correct_spelling(word, vocabulary, word_probabilities): takes a misspelled word and returns a list of suggested corrections, along with their probabilities. The suggested corrections are generated by applying edit operations to the input word and selecting the correction with the highest probability of being the intended word.

SpellChecker Class

The SpellChecker class reads in a corpus file and stores the vocabulary, word counts, and word probabilities. It also provides a method check(word) that takes a misspelled word and returns a list of suggested corrections, sorted by probability.

Usage

To use the spell checker, create an instance of the SpellChecker class with the path to the corpus file as an argument.

Link

Google colab file can be found here

About

An implementation of a spell checker that uses a corpus file to compute word probabilities and suggests corrections for misspelled words by applying edit operations like delete, swap, replace, and insert.


Languages

Language:Python 100.0%