ccxvii / mujs

An embeddable Javascript interpreter in C.

Home Page:http://mujs.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential misuse of multi-byte character in regex.split

sgbeal opened this issue · comments

mujs/jsstring.c

Line 605 in 6f93cab

++a;

The ++a there is incrementing what could be, unless i'm sorely misunderstanding the code (which i might be), a multi-byte character, which would leave the next call to js_doregexec() the start of a string which is currently part of the way through a multi-byte character.

Keep in mind that currently mujs strings are CESU-8 which is different than UTF-8 for codepoints higher than U+FFFF, and mujs itself currently has a known issue when such codepoints appear in a source file (hopefully fixed soon).

This may or may not be related to your issue, but it is related to multibyte codepoint (specifically, 4 bytes codepoints).

i haven't had an issue with it, i just came across it while looking into #130 and it looked suspicious.