Malformed url can cause a bad hostText TextRange struct
schwehr opened this issue · comments
I am finding that malformed URLs like //:%aa@
cause uriParseUriA
to generate invalid TextRange results. Found via fuzzing with ASAN. I am not sure how to correctly fix uriparser, so I can only offer this defensive patch.
/* TODO(schwehr): When this is true, it indicates a bug in the underlying */
/* parser that must be fixed. e.g. "//:%aa@" results in a bad hostText. */
static int URI_FUNC(TextRangeInvalid)(const URI_TYPE(TextRange) *range) {
/* Okay to both be nullptr. */
if (range->first == NULL && range->afterLast == NULL) return URI_FALSE;
if (range->first == NULL && range->afterLast != NULL) return URI_TRUE;
if (range->first != NULL && range->afterLast == NULL) return URI_TRUE;
/* Smaller than empty string or swapped begin <-> end */
if (range->first > range->afterLast) {
return URI_TRUE;
}
return URI_FALSE;
}
And used here to prevent trouble from propagating up, here is a quick bandaid:
int URI_FUNC(ParseUriEx)(URI_TYPE(ParserState) * state, const URI_CHAR * first, const URI_CHAR * afterLast) {
const URI_CHAR * afterUriReference;
URI_TYPE(Uri) * uri;
/* Check params */
if ((state == NULL) || (first == NULL) || (afterLast == NULL)) {
return URI_ERROR_NULL;
}
uri = state->uri;
/* Init parser */
URI_FUNC(ResetParserStateExceptUri)(state);
URI_FUNC(ResetUri)(uri);
/* Parse */
afterUriReference = URI_FUNC(ParseUriReference)(state, first, afterLast);
if (afterUriReference == NULL) {
return state->errorCode;
}
if (afterUriReference != afterLast) {
URI_FUNC(StopSyntax)(state, afterUriReference);
return state->errorCode;
}
/* BEGIN MODIFICATION */
if (URI_FUNC(TextRangeInvalid)(&uri->scheme)) {
fprintf(stderr, "Bad scheme\n");
return URI_ERROR_SYNTAX;
}
if (URI_FUNC(TextRangeInvalid)(&uri->userInfo)) {
fprintf(stderr, "Bad userInfo\n");
return URI_ERROR_SYNTAX;
}
if (URI_FUNC(TextRangeInvalid)(&uri->hostText)) {
fprintf(stderr, "Bad hostText\n");
return URI_ERROR_SYNTAX;
}
if (URI_FUNC(TextRangeInvalid)(&uri->portText)) {
fprintf(stderr, "Bad portText\n");
return URI_ERROR_SYNTAX;
}
if (URI_FUNC(TextRangeInvalid)(&uri->query)) {
fprintf(stderr, "Bad query\n");
return URI_ERROR_SYNTAX;
}
if (URI_FUNC(TextRangeInvalid)(&uri->fragment)) {
fprintf(stderr, "Bad fragment\n");
return URI_ERROR_SYNTAX;
}
/* END MODIFICATION */
return URI_SUCCESS;
}
Will have a closer look as soon as time permits. Thumbs up for the report!
Thanks! I have been running a fuzzer for a week with 28 cores and have yet to find anything else. However, my fuzzing code doesn't have all the API calls and only covers ASCII/char. I build with wchar_t disabled.
I've done a bit of research on case //:%aa@
by now (but there is more to do).
What I found so far is that:
- URI
//:%aa@
is well-formed despite looking weird (see grammar trace below) - The hostname of a URI can be empty (see grammar trace below)
hostText.first
points to right after@
, that looks okayhostText.afterLast
points to:
way earlier (and is set byParseOwnHostUserInfoNz
where I would not expect it)
For the (expected) grammar trace:
Case: //:%aa@
URI-reference = URI / relative-ref
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
=> eats "//"
authority = [ userinfo "@" ] host [ ":" port ]
=> eats ":%aa" and then "@", hostname can be empty (see reg-name below)
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
host = IP-literal / IPv4address / reg-name
reg-name = *( unreserved / pct-encoded / sub-delims )