cktan / tomlc99

TOML C library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

question: reason for "len" in toml_utf8_to_ucs?

gregfi opened this issue · comments

What is the reason for the "len" variable in toml_utf8_to_ucs? Is it to prevent parsing beyond the null terminator? Thanks.

First few bits in the first byte indicates the length of the unicode character, which could range from 1 to 5. Once we know the length of the unicode character, we must make sure that there the whole char is fully stored in the passed in string buffer.

i.e. if the first bits indicate a 3-byte unicode char, and we have only 2 bytes remaining in the buffer, we return an error.