gutfeeling / word_forms

Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Any way to avoid the download every time?

Ralithune opened this issue · comments

I'm writing an anagram solver, and using this to help pull different versions of words in the dictionary like "weakened" vs. "weaken" - "weaken" is in the dictionary file I'm using, but "weakened" is not because it's a different form of the base word.

Every time I run my program, there's a 7 or so second delay while word_forms downloads files and sets things up - is there any way to just have it use the files it downloaded last time?

It looks like maybe the delay is in the setup of nltk and the wordnet? What's it doing, exactly? Why does it take so long on each run?

Word forms requires a few seconds startup time because of the following lines in https://github.com/gutfeeling/word_forms/blob/master/word_forms/constants.py

ALL_WORDNET_WORDS = set()
for synset in list(wn.all_synsets()):
    for lemma in synset.lemmas():
        ALL_WORDNET_WORDS.add(lemma.name())

If you want to avoid the startup time, consider computing ALL_WORDNET_WORDS and persisting it in a file.

Thank you - that reduced startup time from 15 seconds to 5. Is there some reason this isn't done by default? I imagine a method for re-downloading and re-generating the file wouldn't be too much of a hassle, but using a local one by default seems like a more user-friendly approach given that it's only about 1.8 megs.

I guess I could write something and submit it for pull request, but I'm kinda new to contributing.

Good point. Yes, if you want, send me a pull request. No worries about being new to contributing.