nodejs / llhttp

Port of http_parser to llparse

Home Page:http://llhttp.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better handling of invalid character errors

Dreamsorcerer opened this issue · comments

We have a number of user reports in aiohttp where they encounter an "invalid character in header" error.

Please correct me if any of this is already possible (I was not involved in the llhttp integration).

The biggest issue is that when encountering the error, it does not tell us what the raw header is, or which character in the header is incorrect. If we can expose this information to the user, it would make it a lot easier to debug broken server responses.

Sometimes, you have no control over the server in question, so it would also be great to have a way to simply discard any invalid headers and just raise a warning to the user when this happens.

Relevant discussion: aio-libs/aiohttp#5269 (comment)
Our llhttp code: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/_http_parser.pyx#L800

Are you setting the lenient headers flag? Does that help?

commented

I have the same issue. I have tried AIOHTTP_NO_EXTENSIONS=1 and the issue still persists. I have no control or influence on the remote server. With no workaround for AIOHTTP, I implemented HTTPX in my code base for this one remote server. A workaround would be great as managing two request libraries seems unnecessary.

Are you setting the lenient headers flag?

I am going to assume not. Am I right in thinking we just need to call llhttp_set_lenient_headers() on startup? I'll try and give it a go later.

I do notice that the flag in the code says (USE AT YOUR OWN RISK) and is disabled by default, which suggests it's not the best solution though. We tend to err on the more cautious side, following the published standards, so it'd still be preferable to have a more strict approach (such as my proposal to discard invalid headers with a warning, rather than fully aborting).

Yes, that's how you enable it. I don't want to get too confrontational here, but you're dealing with a non-conforming server; if you want standards conformance, the library is doing the right thing already. If you want it to ignore the invalid headers, you need to set that lenient flag.

You're right that it's disabled by default; Node.JS generally sets the defaults to be what they expect in their use-case, and other users need to adjust accordingly, which I think is fine. It sounds like your suggestion is to add a warning mode to llparse vs. just error mode? I'll confess that I'm not really a Node.JS user so I'm not sure I should implement something like that and in the model llparse has I'm not even sure how you'd signal that.

I was thinking of adding a utility function that sets all lenient flags or maybe all safeish lenient flags, or adds an initializer that does so, since I only use llhttp in the context of pallas/pyllhttp or the C code directly and I too prefer the "try to figure it out if you don't absolutely have to bail" version of the code.

Yep, I generally agree. But, it is frustrating when an invalid header stops everything and doesn't even give any indication what the invalid data was. As an example, the case I discovered at work had an invalid character in a random Set-Cookie header, but it was an API endpoint where we wouldn't be using any of those headers anyway. So, discarding the headers would avoid us receiving anything invalid, while still allowing us to work with the API.

The other suggestion though, was just to provide more information in the errors, so we can see the raw value of the header, and ideally, which character was invalid. Without this information, users generally just see these errors as a mystery and keep filing bug reports to tell us our library is broken.

Absolutely understand. Unfortunately, I'm not an expert in llparse and it sounds like that kind of change would have to be done there. I can imagine adding a callback that llhttp invokes on a span if the error becomes a warning due to leniency, but I'm really not sure the right way to go about that.

Unfortunately we cannot modify the error messages to show the offending byte.
However, llhttp already provides llhttp_get_error_pos.

From the source code:

/* Returns the pointer to the last parsed byte before the returned error. The
 * pointer is relative to the `data` argument of `llhttp_execute()`.
 *
 * Note: this method might be useful for counting the number of parsed bytes.
 */
LLHTTP_EXPORT
const char* llhttp_get_error_pos(const llhttp_t* parser);

In other words, you can either get that and then check its first byte, or have your code compute the difference for the data pointer to see how many bytes have been consumed (@pallas you might want to add this to pyllhttp).

Did this solve your problem?

Thanks for the suggestion! I'll see how best to implement it in the python module.

In other words, you can either get that and then check its first byte, or have your code compute the difference for the data pointer to see how many bytes have been consumed (@pallas you might want to add this to pyllhttp).

Did this solve your problem?

Sounds promising, yes. If that works, then only thing missing is probably being able to turn the error into a warning and continue parsing without the broken header.

Closing this for now then. Please reopen if you still have a problem.

Just coming back round to this. I've managed to get the error information working in our code.

However, the recommendation above to enable lenient headers appears to be vulnerable to GHSA-cggh-pq45-6h9x
If we are going to use lenient options, can we get some security guarantees in future versions, that it won't introduce known security vulnerabilities?

For example, given the input POST / HTTP/1.1\r\nHost: localhost:8080\r\nX-Abc: \rxTransfer-Encoding: chunked\r\n\r\n, I'd expect the output for lenient headers to produce a header like {'X-Abc': 'xTransfer-Encoding: chunked'}, but currently it actually produces the vulnerable {'X-Abc': '', 'Transfer-Encoding': 'chunked'}.

Or, following the idea of warnings, rather than errors. Now that I know a little more about the API, maybe it could be possible to do something where we can get the error from the parser (same as it currently works), but then be able to continue parsing, which would just throw away the current line and then continue as if it wasn't present?

@Dreamsorcerer I'm currently handle this in a upcoming llhttp 9. Will keep you posted on this.

I should have some time to look at llhttp 9 at the weekend. Are there any changes to help with this error handling? It looks like every lenient option now has a warning of allowing request smuggling, so that doesn't look like a valid approach to go down.

So, ideally, a lenient option that doesn't expose request smuggling vulnerabilities would be available. Or, alternatively, some way to get parse errors and then continue parsing after discarding the rest of that line could also work (we can raise the error if it relates to Content-Length/Transfer-Encoding, or just log it as a warning if not).

As an example, one case that came up was a Set-Cookie header that contained \x01 (if I remember correctly). It would have been safe to just discard the cookie header as we didn't need it. Users have reported some similar oddities from other servers. (Ironically, it appeared that invalid cookie was from Google Analytics or similar. You'd think they'd know better...)

That's not the philosophy behind llhttp, unfortnately.
The parser aims to be faster by just parsing the data and forwarding to the developer without any further processing.
The developer is then responsible for handling the data. The only option llhttp gives is if to be strict to the spec (now the default as of llhttp 9) or loose in certain areas (with leniency flags).

Since it's 2023 I think is now safe to behave strict by default (to avoid vulnerabilities) and only be loose when explicitly needed (which I discourage anyway. I think it's better to fix the uncompliant end if possible).

Since it's 2023 I think is now safe to behave strict by default (to avoid vulnerabilities) and only be loose when explicitly needed (which I discourage anyway. I think it's better to fix the uncompliant end if possible).

The linked issue is less than 3 years old, and several people have reported similar problems throughout that time (the issue I encountered which seemed to be from Google Analytics was only last year). The problem is that other libraries (like Python's request) do not error when encountering these bad headers. I think a common cause is a web app trying to put non-ascii characters into headers without encoding them correctly. Obviously, these are other people's servers in the wild, so fixing them is usually not possible.

Thinking about it though, there is probably a difference between request/response. So, maybe it would be safe to enable lenient options on response parsing?

Absolutely, I totally agree.
Enabling leniency as a server on received requests is dangerous as people will attack you.
Enabling leniency on responses when acting as a client it's much more safe because you know who you are calling (especially if using HTTPS).