Results of regular expression for exceptions are sometimes incorrect
lmeurs opened this issue · comments
First of all thanks again for still a great script! :-)
Exceptions are turned into regular expressions so they can easily hyphenate words case insensitively (see Hypher()
):
this.exceptions[exceptions[i].replace(/\u2027/g, '').toLowerCase()] = new RegExp(exceptions[i].split('\u2027').join('|'), 'ig');
This line of code concatenates syllables with an OR character ("|") as glue, unfortunately this might result in incorrect arrays of syllables. Take the Dutch word inspirerend
with exception /inspi|re|rend/ig
, this returns the array ["inspi", "re", "re"]
instead of ["inspi", "re", "rend"]
. Since an OR is used, the third syllable matches with the second part of the regex which is returned by .match()
.
A solution might be to use regexps like /(inspi)(re)(rend)/i
, ie.:
this.exceptions[exceptions[i].replace(/\u2027/g, '').toLowerCase()] = new RegExp('(' + exceptions[i].split('\u2027').join(')(') + ')', 'i');
The result differs slightly, namely the array ["inspirerend", "inspi", "re", "rend", index: 0, input: "inspirerend"]
. Since the script needs an array with only syllables, we have to unshift the array to remove the first element (see Hypher.prototype.hyphenate()
):
return word.match(this.exceptions[word.toLowerCase()]);
can become:
var regexpResult = word.match(this.exceptions[word.toLowerCase()]);
regexpResult.shift();
return regexpResult;
(the variable regexpResult
should be declared with the other variables in this function)
Excellent bug report. Thanks a lot! I've fixed this bug in 5b2de04 and also added a test case so it doesn't happen again.
Thanks again.