nodejs / http-parser

Hello. Please look in RFC 7230 - 3.3.3 Message Body Length

If a Transfer-Encoding header field is present and the chunked
transfer coding (Section 4.1) is the final encoding, the message
body length is determined by reading and decoding the chunked
data until the transfer coding indicates the data is complete.

If a Transfer-Encoding header field is present in a response and
the chunked transfer coding is not the final encoding, the
message body length is determined by reading the connection until
it is closed by the server.  If a Transfer-Encoding header field
is present in a request and the chunked transfer coding is not
the final encoding, the message body length cannot be determined
reliably; the server MUST respond with the 400 (Bad Request)
status code and then close the connection.

As I can understand it means that both Transfer-Encoding: gzip, chunked and Transfer-Encoding: chunked, gzip are possible. chunked, gzip may look meaningless but it is possible and its behaviour is well described.

the message body length is determined by reading the connection until it is closed by the server means that we need to decode gzip first and than chunked.

But it means that all nginx, mongrel, nodejs (and its descendants) http parsers are broken by design. chunked encoder/decoder should be shipped outside http parser as separate library. It should be used by end-user together with gzip, brotli, zstd, compress encoders/decoders.

Another solution is to develop fat parser with gzip, brotli, zstd, compress, etc dependencies.

This issue can't be fixed. I want just to inform everyone using http parser.

I don't understand what you think the issue is.

chunked, gzip may look meaningless but it is possible and its behaviour is well described.

Yes, its behaviour is described:

If a Transfer-Encoding header field is present in a request and the chunked transfer coding is not the final encoding, the message body length cannot be determined reliably; the server MUST respond with the 400 (Bad Request) status code and then close the connection.

@sam-github, this exception is assigned for the message body length cannot be determined reliably + chunked transfer coding is not the final encoding. It means that you can't receive HTTP header with Transfer-Encoding: chunked, gzip and without Content-Length.

But if Content-Length exists than Transfer-Encoding: chunked, gzip is valid header.

From the very same 3.3.3:

   If a message is received with both a Transfer-Encoding and a
   Content-Length header field, the Transfer-Encoding overrides the
   Content-Length.  Such a message might indicate an attempt to
   perform request smuggling (Section 9.5) or response splitting
   (Section 9.4) and **ought to be handled as an error**.

In other words, Transfer-Encoding cannot be present together with Content-Length.

It must be said that the presence of both headers leads to quite practical request smuggling attacks as was demonstrated by security researchers.

chunked encoder/decoder should be shipped outside http parser as separate library. It should be used by end-user together with gzip, brotli, zstd, compress encoders/decoders.

This simply can't work as the presence of chunked TE alters protocol in a way that cannot be offloaded to a 3rd party module.

Hello @indutny, I can't agree with you, please read carefully.

If a Transfer-Encoding header field is present in a response and the chunked transfer coding is not the final encoding, the message body length is determined by reading the connection until it is closed by the server.
If a Transfer-Encoding header field is present in a request and the chunked transfer coding is not the final encoding, the message body length cannot be determined reliably; the server MUST respond with the 400 (Bad Request) status code and then close the connection.
If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling.

3 exception is related to request only, response just reads as much as possible data from server and has no possible security exceptions.

Please read RFC 7230 - 3.3.1 Transfer-Encoding for clarification too.

If any transfer coding other than chunked is applied to a request payload body, the sender MUST apply chunked as the final transfer coding to ensure that the message is properly framed.
If any transfer coding other than chunked is applied to a response payload body, the sender MUST either apply chunked as the final transfer coding or terminate the message by closing the connection.

This simply can't work as the presence of chunked TE alters protocol in a way that cannot be offloaded to a 3rd party module.

I see no collisions. Transfer-Encoding: chunked, gzip means just gzip after chunked. User are reading data, than applies gzip and than chunked.

I was wrong about Content-Length: it can't be used to identify the length of body if Transfer-Encoding was used, this usage is forbidden. But responses without Content-Length can have Transfer-Encoding with chunked in any position with other encodings.

Well, FWIW it is allowed for responses:

http-parser/http_parser.c

Lines 1904 to 1913 in 5c5b3ac

    
                     } else { 
        
                       /* RFC 7230 3.3.3 */ 
        
                       /* If a Transfer-Encoding header field is present in a response and 
        
                        * the chunked transfer coding is not the final encoding, the 
        
                        * message body length is determined by reading the connection until 
        
                        * it is closed by the server. 
        
                        */ 
        
                       UPDATE_STATE(s_body_identity_eof); 
        
                     }

Parser doesn't depend on gzip, brotli, etc it can't decode chunked for user, because there is another encoding on the top of chunked. User have to handle chunked by himself anyway.

	} else {
	/* RFC 7230 3.3.3 */

	/* If a Transfer-Encoding header field is present in a response and
	* the chunked transfer coding is not the final encoding, the
	* message body length is determined by reading the connection until
	* it is closed by the server.
	*/
	UPDATE_STATE(s_body_identity_eof);
	}

Chunked may not be final in transfer encoding list