nodejs / llhttp

Port of http_parser to llparse

Home Page:http://llhttp.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

9.1.0 regression: doesn't parse message following 100 response

Dreamsorcerer opened this issue · comments

@ShogunPanda Sorry to bother you again, while updating everything and running our full test suite, it appears we've found a regression introduced in 9.1.0.

When given a response like this:
HTTP/1.1 100 Continue\r\n\r\nHTTP/1.1 404 Not Found\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 14\r\nDate: Fri, 15 Sep 2023 19:47:23 GMT\r\nServer: Python/3.10 aiohttp/4.0.0a2.dev0\r\n\r\n404: Not Found

In 9.1+ this only parses the initial 100 response and nothing more. In older releases this would parse the 100 response and then parse the 404 response. This causes our client to hang and then timeout, thinking that the server never sent a response.

No worries.
Yeah, I also noticed it. It is already fixed in 9.1.2.

Sorry, I was testing in 9.1.2 originally, it's not working for us. I just tested some older releases to see where the behaviour changed.

I wonder if we could figure out a way to run some of our test suite in your CI, could be helpful to detect regressions. I think we have easily 100+ tests that exercise llhttp.

That would be sick!
If you can send a PR with your tests in the Markdown format we currently use it would be amazing.
In which format do you currently have such tests?

They are Python tests, and many of them test functionality of our client and server, not directly the parser output. e.g. The test that caught this is verifying that we get a 404 response from an Expect 100 request. This test fails because llhttp isn't parsing the 2nd message.

So, I think the only way we could make it work, is if we tried to add a new test job that cloned our repo, rebuilt the parser with the current llhttp code, and then executed a curated selection of our test suite. So, I think it would all be done a separate job that doesn't touch anything else in your repo.

I see.
Makes sense. In that case if your repo would send an automated message to our repo in case of failures (for instance, opening an issue via GH API) it would be great!

I was thinking I could just make a job on this repo that performs those steps, I'll try playing with it in the next few weeks.

Amazing!
Let me know if I can help in any way. I have limited understanding of Python (never used it in the last 10 years), but I might review the PR.

As the primary maintainer of the llhttp package in Fedora Linux, and as one of the people who often works on the python-aiohttp package, I’m thrilled to see this plan to work together on testing.

GitHub plumbing is not my strong suit, but I’m here if there’s anything I can do to help.

I see. Makes sense. In that case if your repo would send an automated message to our repo in case of failures (for instance, opening an issue via GH API) it would be great!

That's not going to be possible with GHA because:

  1. GHA jobs are basically stateless. Each new cron job failure would open a new PR which would just create spam.
  2. Side-effects caused by the GHA's default token don't propagate so nobody would get notifications

This could be achieved by making a stateful GitHub App but it's a bigger effort to maintain.

Other possibilities:

  1. Having llhttp's GHA send repository_dispatch or workflow_dispatch events to the aiohttp's repo. This one doesn't seem scaleable and suffers from a similar token problem. The latter can be addressed by integrating a GitHub App for retrieving a custom token. But it probably doesn't make sense to do this for aiohttp and would require a lot of coordination.
  2. I've seen some projects running downstream test suites of important packages using a lib. Example: https://github.com/pyca/cryptography/blob/73d070e/.github/workflows/ci.yml#L349-L404 / https://github.com/pyca/cryptography/tree/73d070e/.github/downstream.d.

I think that (4) is the most realistic path forward.

@ShogunPanda Any idea when you'll get to look at this? We're still blocked on making a new release currently.

@ShogunPanda @Dreamsorcerer Now that Python 3.12 is out, it'd be great to have aiohttp working on it!

This is blocking aiohttp's release for the latest version of Python with cascading effects on many downstream packages. aio-libs/aiohttp#7639, aio-libs/aiohttp#7640.

Downstream packages and projects depending on aiohttp cannot update to Python 3.12 due to this issue.

@ShogunPanda Please merge #250, figure out the regression in In 9.1.x and release a new version of llhttp.

Thank you.

I've bisected the regression to 6922f5b

In particular the following check:

llhttp/src/native/http.c

Lines 45 to 49 in e059b1b

/* See RFC 2616 section 4.4 - 1xx e.g. Continue */
(
parser->type == HTTP_RESPONSE &&
(parser->status_code == 100 || parser->status_code == 101)
)

Just removing it, would resolve it.

I'm on it at the moment. Will hope to give an update within a hour.

@cdce8p Thanks for the bisecting. Will take a look right now.

I found the issue. Releasing in few minutes.

Released v9.1.3.

Released v9.1.3.

@Dreamsorcerer Does it work with v9.1.3. Because https://github.com/nodejs/llhttp/actions/runs/6394836947 failed.

EDIT: Or is this due to the wording of an error message being changed. If so, please update.

Released v9.1.3.

@Dreamsorcerer Does it work with v9.1.3. Because https://github.com/nodejs/llhttp/actions/runs/6394836947 failed.

Yes it does. The one test should be adjusted in aiohttp itself. The error message changed slightly, so the regex doesn't match anymore. That can easily be fixed.