Words that include an umlaut are not being hyphenated
deboerk opened this issue · comments
Hi,
I have the following issue with your great hypher script: Words that contain an umlaut don't get hyphenated. Here is an example:
<html>
<body>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="jquery.hypher.js"></script>
<script type="text/javascript" src="de.js"></script>
<style>
#foo { width: 5px; }
</style>
<div id="foo">müsse musse</div>
<script>
$("#foo").hyphenate("de");
</script>
</body>
</html>
The word "musse" is hyphenated as expected ("mus-se"), whereas "müsse" is not hyphenated at all ("müsse"). I tried to add an exception (müs‧se), but that didn't help. All my files are encoded in UTF-8. Can you help me out? Thank you!
Regards
deboerk
Same issue:
Hypher.languages.de.hyphenateText("sozioökonomisch").replace(/\u00AD/g, "|")
"so|zioöko|no|misch"
Hypher.languages.de.hyphenateText("Kostenschätzungen").replace(/\u00AD/g, "|")
"Kos|tenschätzun|gen"
The hyphenations look better when removing the umlauts:
Hypher.languages.de.hyphenateText("soziookonomisch").replace(/\u00AD/g, "|")
"so|zio|o|ko|no|misch"
Hypher.languages.de.hyphenateText("Kostenschatzungen").replace(/\u00AD/g, "|")
"Kos|ten|schat|zun|gen"
In the original patterns file http://tug.org/svn/texhyphen/trunk/collaboration/repository/hyphenator/de_1996.js?view=markup there is a property specialChars : 'ßàáâäçèéêëíñóôöü' which is omitted in de.js and is not used in jquery.hypher.js. Maybe this is related.
Sorry for the late response. This is indeed due to the special characters. I forgot that JavaScript's RegEx implementation does not support unicode at all. I've created a new pull request (#15) that attempts to fix this issue. Let me know if that fixes the issue for you.
Yep, it looks better now. Thanks a lot for the fix!
Thanks for testing! This is now released as v0.2.1. Thanks again both!