uriparser / uriparser

:hocho: Strictly RFC 3986 compliant URI parsing and handling library written in C89; moved from SourceForge to GitHub

Home Page:https://uriparser.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Malformed url can cause a bad hostText TextRange struct

schwehr opened this issue · comments

I am finding that malformed URLs like //:%aa@ cause uriParseUriA to generate invalid TextRange results. Found via fuzzing with ASAN. I am not sure how to correctly fix uriparser, so I can only offer this defensive patch.

/* TODO(schwehr): When this is true, it indicates a bug in the underlying */
/*   parser that must be fixed. e.g. "//:%aa@" results in a bad hostText. */
static int URI_FUNC(TextRangeInvalid)(const URI_TYPE(TextRange) *range) {
  /* Okay to both be nullptr. */
  if (range->first == NULL && range->afterLast == NULL) return URI_FALSE;

  if (range->first == NULL && range->afterLast != NULL) return URI_TRUE;
  if (range->first != NULL && range->afterLast == NULL) return URI_TRUE;

  /* Smaller than empty string or swapped begin <-> end */
  if (range->first > range->afterLast) {
    return URI_TRUE;
  }

  return URI_FALSE;
}

And used here to prevent trouble from propagating up, here is a quick bandaid:

int URI_FUNC(ParseUriEx)(URI_TYPE(ParserState) * state, const URI_CHAR * first, const URI_CHAR * afterLast) {
	const URI_CHAR * afterUriReference;
	URI_TYPE(Uri) * uri;

	/* Check params */
	if ((state == NULL) || (first == NULL) || (afterLast == NULL)) {
		return URI_ERROR_NULL;
	}
	uri = state->uri;

	/* Init parser */
	URI_FUNC(ResetParserStateExceptUri)(state);
	URI_FUNC(ResetUri)(uri);

	/* Parse */
	afterUriReference = URI_FUNC(ParseUriReference)(state, first, afterLast);
	if (afterUriReference == NULL) {
		return state->errorCode;
	}
	if (afterUriReference != afterLast) {
		URI_FUNC(StopSyntax)(state, afterUriReference);
		return state->errorCode;
	}

  /* BEGIN MODIFICATION */
  if (URI_FUNC(TextRangeInvalid)(&uri->scheme)) {
    fprintf(stderr, "Bad scheme\n");
    return URI_ERROR_SYNTAX;
  }
  if (URI_FUNC(TextRangeInvalid)(&uri->userInfo)) {
    fprintf(stderr, "Bad userInfo\n");
    return URI_ERROR_SYNTAX;
  }
  if (URI_FUNC(TextRangeInvalid)(&uri->hostText)) {
    fprintf(stderr, "Bad hostText\n");
    return URI_ERROR_SYNTAX;
  }
  if (URI_FUNC(TextRangeInvalid)(&uri->portText)) {
    fprintf(stderr, "Bad portText\n");
    return URI_ERROR_SYNTAX;
  }
  if (URI_FUNC(TextRangeInvalid)(&uri->query)) {
    fprintf(stderr, "Bad query\n");
    return URI_ERROR_SYNTAX;
  }
  if (URI_FUNC(TextRangeInvalid)(&uri->fragment)) {
    fprintf(stderr, "Bad fragment\n");
    return URI_ERROR_SYNTAX;
  }
  /* END MODIFICATION */
	return URI_SUCCESS;
}

Will have a closer look as soon as time permits. Thumbs up for the report!

Thanks! I have been running a fuzzer for a week with 28 cores and have yet to find anything else. However, my fuzzing code doesn't have all the API calls and only covers ASCII/char. I build with wchar_t disabled.

I've done a bit of research on case //:%aa@ by now (but there is more to do).
What I found so far is that:

  • URI //:%aa@ is well-formed despite looking weird (see grammar trace below)
  • The hostname of a URI can be empty (see grammar trace below)
  • hostText.first points to right after @, that looks okay
  • hostText.afterLast points to : way earlier (and is set by ParseOwnHostUserInfoNz where I would not expect it)

For the (expected) grammar trace:

Case: //:%aa@

URI-reference = URI / relative-ref
 relative-ref  = relative-part [ "?" query ] [ "#" fragment ]
  relative-part = "//" authority path-abempty
=> eats "//"
   authority     = [ userinfo "@" ] host [ ":" port ]
=> eats ":%aa" and then "@", hostname can be empty (see reg-name below)
    userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )
    host          = IP-literal / IPv4address / reg-name
     reg-name      = *( unreserved / pct-encoded / sub-delims )