bramstein / hypher

A fast and small JavaScript hyphenation engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Results of regular expression for exceptions are sometimes incorrect

lmeurs opened this issue · comments

First of all thanks again for still a great script! :-)

Exceptions are turned into regular expressions so they can easily hyphenate words case insensitively (see Hypher()):

this.exceptions[exceptions[i].replace(/\u2027/g, '').toLowerCase()] = new RegExp(exceptions[i].split('\u2027').join('|'), 'ig');

This line of code concatenates syllables with an OR character ("|") as glue, unfortunately this might result in incorrect arrays of syllables. Take the Dutch word inspirerend with exception /inspi|re|rend/ig, this returns the array ["inspi", "re", "re"] instead of ["inspi", "re", "rend"]. Since an OR is used, the third syllable matches with the second part of the regex which is returned by .match().

A solution might be to use regexps like /(inspi)(re)(rend)/i, ie.:

this.exceptions[exceptions[i].replace(/\u2027/g, '').toLowerCase()] = new RegExp('(' + exceptions[i].split('\u2027').join(')(') + ')', 'i');

The result differs slightly, namely the array ["inspirerend", "inspi", "re", "rend", index: 0, input: "inspirerend"]. Since the script needs an array with only syllables, we have to unshift the array to remove the first element (see Hypher.prototype.hyphenate()):

return word.match(this.exceptions[word.toLowerCase()]);

can become:

var regexpResult = word.match(this.exceptions[word.toLowerCase()]);
regexpResult.shift();
return regexpResult;

(the variable regexpResult should be declared with the other variables in this function)

Excellent bug report. Thanks a lot! I've fixed this bug in 5b2de04 and also added a test case so it doesn't happen again.

Thanks again.