nodejs / llhttp

Port of http_parser to llparse

Home Page:http://llhttp.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with on_body callback with bare LF

Dreamsorcerer opened this issue · comments

I'm testing the fix for #236, with a payload based on a user report:

HTTP/1.0 200 OK\nFoo: abc\nBar: def\n\nBODY\n

If I add CR before the body, then it works fine:

HTTP/1.0 200 OK\nFoo: abc\nBar: def\r\n\r\nBODY\n

But, with the original, it appears that the on_headers_complete callback is not happening. The on_body callback happens without a call to on_headers_complete.

This seems like something is broken as we are not getting the headers.
But, also, should we always get on_headers_complete before on_body happens? Currently our code fails with an exception, as it assumes that on_body will only be called after on_headers_complete is called.

It appears we have some users who rely on this behaviour to interface with switch hardware. Given that they likely can't rewrite the firmware of the switch, they'll only be able to use this when this issue is fixed and we can parse responses that do not use CRLF at all.

I see. I'll take a look between today and tomorrow.

We're hoping to do a new release next week, is there any chance this might be resolved and released in that timeframe?

Sorry, I got busy with other things. I'll try to look into it on Mon or Tue next week.

@Dreamsorcerer As promised, this is now fixed in #243 (a new flag is needed to enable the fix).
This will be included in llhttp 9.1.0.

Perfect, if you have time for another small addition, another user has just reported they have a device that responds with whitespace in the chunk sizes (e.g. 37 \r\n): aio-libs/aiohttp#7597 (comment)

I couldn't find any lenient options that allow that to be permitted (I've asked them to file a new issue here if they want it to work).

I'm sorry but I don't think I will implement that. We can't support any possible uncompliant case and this seems really peculiar.

Not sure how you embed llhttp but I'd rather try to intercept the data before feeding it into llhttp.

Well, I'll let you know if any other users report the same problem. They did point out that this is permitted by atleast firefox, python-requests, curl, perls lwp-request, chrome, and previous versions of aiohttp/llhttp.

@mcollina @RafaelGSS @nodejs/tsc Any opinion on this? Shall we support this specific case?

@ShogunPanda Just tested 9.1.1 and I seem to still get no on_headers_complete.

Enabled options are:
llhttp_set_lenient_headers
llhttp_set_lenient_optional_cr_before_lf
llhttp_set_lenient_spaces_after_chunk_size

Can you post the entire message you are trying to parse?

The original post contains the entire message. I'm just using that for a unit test, which is still failing:

def test_http_response_parser_bad_crlf(response: Any) -> None:
    messages, upgrade, tail = response.feed_data(
        b"HTTP/1.0 200 OK\nFoo: abc\nBar: def\n\nBODY\n"
    )
    msg = messages[0][0]

    assert msg.headers["Foo"] == "abc"
    assert msg.headers["Bar"] == "def"

Sorry, you are right. Despite the message being parsed, the callback is ignored. Working on it as we speak.

The chunked whitespace is passing now, so that's good atleast.

This should have been fixed in 9.1.2. Can you please confirm?

Test is passing now, thanks.

Can we expect that on_body will never be called without a call to on_headers_complete first, now? I realised when we testing this, that our code assumes that can't happen, and hits an exception if it does.

In theory yes.
In practice if it happens then we have a bug :)