whatwg / fetch

Fetch Standard

Home Page:https://fetch.spec.whatwg.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Refactoring WebSocket connections

annevk opened this issue · comments

The objective is to make WebSocket connections hook into "obtain a connection" to make it clear what logic is shared.

There are two problems:

  1. Step 2 of https://datatracker.ietf.org/doc/html/rfc6455#section-4.1.
    1. Defining this properly requires an abstract notion of turning a domain into an IP address. We might need that anyway for features such as dns-prefetch, but I recall @sleevi hoping we could avoid it.
    2. The side table might also have to be partitioned to be completely safe. This would not significantly widen the DOS scope as the attacker would be constrained by the number of tabs the user has open where the top-level site is unique and the attacker is embedded. An argument could be made that the privacy attack (not the DOS attack) is nearly indistinguishable from network jitter however and not worth the effort.
  2. Requiring a dedicated connection, but only for HTTP/1. There are two approaches here:
    1. Introduce a new parameter.
    2. Turn the dedicated parameter into an enum: "not", "if-http1", "always". This would require slight tweaks to WebTransport. I slightly prefer this, but the alternative is acceptable to me as well.

We could side step some of the first problem by changing the monkey patching in "obtain a WebSocket connection", but that does not seem like a good long term solution.

This came out of a discussion in #1241 which had a rather optimistic-and-wrong take on what sharing more logic would mean. As I envision it now we would keep the "obtain a WebSocket connection" operation, but we'd pass it a partition key as well, inline the relevant bits from the RFC as outlined above, and make it use "obtain a connection".

If this seems like a good plan I will make time to work on this. (I would like to solve this before #889 and #1118.)

cc @ddragana @MattMenke2 @ricea @yutakahirano

CC’ing @letitz because the DNS discussion is also related to WICG/private-network-access#51

My hope in avoiding is largely a proximity of the complexity it introduces, and because of the handwave “magic happens here” part of the composability of different protocol layers.

  • How to account for Happy Eyeballs behaviours (which vary across implementations, and may extend to racing connections and not just racing resolutions)
  • How to account for the different resolution layers and behaviours (DNS vs WINS/NETBIOS vs mDNS, etc)
  • How to account for other protocol races (e.g. for alternate protocols returned via DNS)

I do think you’re right that we need some aspect of handwave to “turn this host into a collection of IP addresses”, which can then be accounted for in things like WebSockets or private network access, but specifying it in a way in which these differences continue to remain valid/opaque alternatives, and not accidentally specifying a linear sequence that MUST be followed, is the tricky part.

That said, I think your proposed approach generally sounds workable. One bit of challenge for Chromium is that we don’t (generally) think of dedicatedness as an explicit input parameter. In general, we want a new connection, either with a particular protocol version (e.g. if an H/2 server wants to use TLS Client cert auth or NTLM/Negotiate and responds with HTTP_1_1_REQUIRED, or situations like WebSockets where H/2 may not be allowed widely), or explicitly reusing the current connection (e.g. completing the third leg of an NTLM/Negotiate handshake). While in the current proposal, I think the existing implementation would conform with your rough proposed language, I worry if there might end up being impedance mismatches if that parameter gets reused elsewhere.

Here's a hypothetical privacy attack on a non-partitioned side-table:

  1. a.com and b.com embed evil.com
  2. On loading, evil.com attempts to probe its current "throttle cookie". If the probe fails, it attempts to set a random "throttle cookie" instead. In both cases, the "throttle cookie" is communicated back to evil.com to correlate the loads.
  3. Setting a "throttle cookie" works as follows:
    1. Choose a random non-empty subset of ws0.evil.com to ws31.evil.com
    2. Connect repeatedly to ws://wsX.evil.com/set, making sure there are always at least two handshake attempts pending
    3. The servers at wsX.evil.com are configured to wait 2 seconds before responding to the handshake when they see a connection to the /set endpoint
  4. Probing a "throttle cookie" works as follows:
    1. Attempt to connect to all of ws0.evil.com to ws31.evil.com, using the endpoint ws://wsX.evil.com/probe
    2. The servers always respond immediately to the /probe endpoint.
    3. Time how long the handshakes take
    4. Consider a handshake that took >= 2 seconds to be in the set, and < 2 seconds to not be in the set.

This permits evil.com to associate the two sessions. Is this a privacy attack we need to worry about?

@ricea: Yes, I think that attack is exactly the sort of thing network partitioning is intended to mitigate.

Here's a sketch for resolve a domain:

To resolve a domain, given a domain domain and a network partition key key, perform an implementation-defined operation to turn domain into a set of one or more IP addresses. If this operation succeeds, return the set of IP addresses. If it fails, return failure. The results of this operation may be cached. If they are cached, key must be used as part of the cache key.

Note: typically this operation would involve DNS. The particulars (apart from the cache key) are not tied down as they are not pertinent to the system the Fetch Standard establishes. Other documents ought not to build on this primitive without having a considered discussion with the Fetch Standard community first. [DNS]

Picking an IP address from this set to obtain a connection from can be done in an implementation-defined manner. I suppose the WebSocket connection steps would compare each IP address against existing WebSocket connection IP addresses.

@sleevi how does WebSocket obtain a new-or-existing connection in Chromium with that setup then? We have the dedicated parameter for WebTransport. It seems like WebSocket needs it as well, though in a more limited way. I'm not aware of anything else needing it.

@ricea thanks! I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1713748 to track that in Firefox. My proposal is that we use the network partitioning key, but the way Chrome defines that key it would be far less successful at avoiding the DOS concerns this cache was meant to address, I think. Not sure to what extent that's a problem.

Chrome uses a separate pseudo-socket pool for non-H2/H3 WebSocket connections, which enforces WebSocket connection limits (And uses slightly different connection attempt ordering rules in the case of multiple IP addresses for a single DNS lookup, since WebSocket enforces limits on a per-IP-address basis, I believe). Chrome never repurposes an HTTP/1.x socket for use with WebSockets.

For H2 WebSocket connections, it looks for an existing H2 connection that supports WebSockets to the corresponding origin.