Repository containing some resources that are used in my master thesis.
- arial.ttf (Arial MT) - Downloaded for free from ufonts.com
- arial-mono.ttf (Arial Monospaced MT) - Downloaded for free from ufonts.com
Note: The parser only validates words with length greater than three. Any non-ascii characters are removed. Duplicates will also be removed. Becuase of this, the actual data set is a lot smaller than the size of these wordlists.
- wordlist1.txt - Downloaded from sil.org. 109,582 words
- wordlist2.txt - words.txt file ownloaded from github.com/dwyl/english-words. 354,985 words
- wordlist3.txt - Downloaded from English dictionaries for Apache OpenOffice. Only en_US. Preprocessed. 39,908 words