Removal of apostrophes, hyphens and things.
AllBecomesGood opened this issue · comments
So in eda.py you remove several things like:
line = line.replace("’", "")
line = line.replace("'", "")
line = line.replace("-", " ")
And I was wondering why is that? Cause while this augmentation method improved my results dramatically I now need to somehow get data back in which let's the bot learn that "I'm" is the same as "I am" etc, as the data now only ever includes "im".
Is this some limitation of WordNet or something?
I don't know if WordNet is a the limitation, but I don't think so. Basically having the punctuation makes EDA more complicated, so I removed it. You're welcome to add it back, and if you have a good solution, feel free to send a PR.