Three techniques are used to mess up your inputted text:
- Words are replaced with their common spelling mistakes (generated from this corpora, found in
./misspelled
) - Random letters within a word are capitalized
- Spaces are generated randomly in the middle of, and in-between, words
Each of the above can be commented out to turn on/off a technique. Probabilities can also be changed to mess up the text more, or less.
- Place your text within a
input.txt
file, placed within the root directory of this project - Tweak
spelld.py
to fit your needs - Run
spelld.py
- Get the messed up text within the
output.txt
file
- Rework each technique so that they can all be done in a single pass
- Implement a smarter algorithm for generating spelling mistakes (maybe use an additional corpora?)
- Flesh out the text filter to normalize inputted text