Potential misuse of multi-byte character in regex.split
sgbeal opened this issue · comments
Line 605 in 6f93cab
The ++a
there is incrementing what could be, unless i'm sorely misunderstanding the code (which i might be), a multi-byte character, which would leave the next call to js_doregexec()
the start of a string which is currently part of the way through a multi-byte character.
Keep in mind that currently mujs strings are CESU-8 which is different than UTF-8 for codepoints higher than U+FFFF, and mujs itself currently has a known issue when such codepoints appear in a source file (hopefully fixed soon).
This may or may not be related to your issue, but it is related to multibyte codepoint (specifically, 4 bytes codepoints).
i haven't had an issue with it, i just came across it while looking into #130 and it looked suspicious.