jshttp / type-is

Infer the content-type of a request.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Obscure condition checks

binarykitchen opened this issue · comments

There is a bad bug, all my json requests do return false on req.is('json') within the expressjs v4 code :(

I debugged and think these lines do not make any sense at all:

  if (!(parseInt(headers['content-length'], 10)
    || 'transfer-encoding' in headers)) return;

In my requests I have no content-length yet and there is no transfer-encoding.

This here is my header:

{ host: 'localhost:8080',
  connection: 'keep-alive',
  'cache-control': 'no-cache',
  pragma: 'no-cache',
  'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36',
  'content-type': 'application/json',
  accept: '*/*',
  dnt: '1',
  referer: 'https://localhost:8080/',
  'accept-encoding': 'gzip,deflate,sdch',
  'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6',
  cookie: 'session=81e32809-7efe-447d-bbda-624acbbc2543; session.sig=d2gT3Yssb_ZLqYixqojzEGs53j8; _ga=GA1.1.1416672249.1394592952; language=en; express:sess=eyJwYXNzcG9ydCI6e319; express:sess.sig=bhxojHt1UQWbtKNR_ztU_-X2LLg' }

It must work with these!

what crazy client is making these requests?

The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.3

this sounds like a bug with your client

This client https://github.com/visionmedia/superagent

Has quite a lot of stars and forks.

And really, content-length is only relevant when it comes to the response header. But in my case I am asking the request if it's a JSON or not. Totally irrelevant what the content length of the response will be!

well it should at least have transfer-encoding. we also do this because jquery sends the content-type header without a body, then people complain when it throws a 400 error.

Does superagent set transfer-encoding?

i'm pretty sure that's something the browser sends automatically. i don't think xhr can set that header.

{
            "host" : "10.0.1.2:3000",
            "accept-encoding" : "gzip, deflate",
            "accept-language" : "en-us",
            "accept" : "*/*",
            "origin" : "http://10.0.1.2:3000",
            "content-length" : "0",
            "connection" : "keep-alive",
            "user-agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14",
            "x-csrf-token" : "LBvWj8zfB0-cjpTd3oJzsxP8bLA4X0K+pA3PAY=",
            "dnt" : "1"
        }

that's how my request headers look like and i'm using superagent

that's node though

Alright. But why am I seeing no content-length here?? Grrr ...

And since you said, that's something the browser is sending, how can I check in my Chrome if Chrome is doing it correctly?

just open up the inspector. shows you request/response/body

Yes, I already do that. I only see the content-length in the response, but not in the request.

Yeah I see and believe you. But I do not have any content-length here at all!

Request URL:https://localhost:8080/video?size=preview&from=0&limit=9
Request Method:GET
Status Code:200 OK
Request Headers (view source)
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8,de;q=0.6
Connection:keep-alive
Content-Type:application/json
Cookie:session=2f6a66cd-8d04-4d72-9026-5fccef39d649; session.sig=O378FHdqAXw3JEvgnPSEMdFbKaM; connect.sess=s%3Aj%3A%7B%22passport%22%3A%7B%22user%22%3A%2211e3-bf90-86a1bfb0-9574-198ee2fcdadb%22%7D%7D.Qt%2BUAiwanyEVaYlsRKUiW%2BwZ%2BFTxY1uc%2F7LjkSy54yM; express:sess=eyJwYXNzcG9ydCI6e319; express:sess.sig=bhxojHt1UQWbtKNR_ztU_-X2LLg
DNT:1
Host:localhost:8080
Referer:https://localhost:8080/
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36

haha at this point, just ask stackoverflow or something. beyond what i know

I could email you a secret link to a demo site. Would you do this?

Just sent you an email ...

Sorry to bug you but this is really important.

If you look at the older ExpressJS code you see it is very different:
https://github.com/visionmedia/express/blob/3.x/lib/request.js#L322

It always worked like that for me until recently with the change to v4 of ExpressJS, using your type-is module. I fear it is breaking a lot of things ...

Can't you remove

  if (!(parseInt(headers['content-length'], 10)
    || 'transfer-encoding' in headers)) return;

for now? IMO robustness is more important than respecting weird W3 standards that do not make much sense.

yeah i guess we could. anyone else have input? maybe we can change this library to only check a type against an array of types

Or introduce a new option called strict which is false by default and when true, check the content-length as usual.

It always worked like that for me until recently with the change to v4 of ExpressJS, using your type-is module. I fear it is breaking a lot of things ...

The purpose of the 3.x -> 4.x version number change was because stuff is going to break because it's not all backwards-compatible.

Please do not remove that logic out of rush. @binarykitchen you posted above a GET request... A GET request would never have a body and req.is only works with bodies. Why would you use that on a GET request? Are you sure you didn't mean to make a POST request with some JSON or send an Accept header and use req.accepts?

Aaahhhh, @dougwilson, you nailed it. I had a mess between req.is and req.accepts - now I fixed it in my app and everything works fine now!

I wish superagent would have shown a warning in my case. I was setting the content type with .type() for a GET request which was wrong. See http://visionmedia.github.io/superagent/#setting-the%20content-type

All good. Case closed. But I think we should document a little more. Note in capitals that type-is only works with bodies and so on.

@jonathanong Sorry about all that. Hope we all learned something here :)

that's weird. GET requests could still have bodies (though htey shouldn't). but i guess your request really had no body and had the content-type header?

but i guess your request really had no body and had the content-type header?

@binarykitchen was actually manually adding the content-type header, which is how it got there.

GET requests could still have bodies (though htey shouldn't).

Yes, a GET request can certainly have a body, but who does that??

i'm going to remove the automatic body checking by default and expose it as a utility. i'd like to use this in non-HTTP settings.

Hey all,

We're also running into the same issue after updating to the latest express version. I checked other implementations of content type checking (e.g. Rack/Rails), and they correctly handle the case of an empty message body, and express/type-is should handle this correctly, too.

I'm happy to submit a PR that fixes that, as the current behaviour makes no sense: There's no way to send an empty content-body while specifying a content-type, and have the application handle that correctly.

setting a content-type without a body doesn't make any sense. either way, you can use typeis.is() directly: https://github.com/expressjs/type-is/blob/master/index.js#L6 it's undocumented on the readme but it's not going to change anytime soon

@jonathanong From: http://www.w3.org/2001/tag/doc/whenToUseGet.html#safe

Note that it is possible to use POST even without supplying data in an HTTP message body. In this case, the resource is URI addressable, but the POST method indicates to clients that the interaction is unsafe or may have side-effects.

we're talking about content types + no body, not just having no message body. the request method is irrelevant. you can always send no body if you'd like.

it's like telling the server, "here's nothing, and i'm going to name it Amanda". it doesn't make sense, and ignoring it makes servers and developers less prone to errors

Also, from http://tools.ietf.org/html/rfc7230

A user agent SHOULD send a Content-Length in a request message when
no Transfer-Encoding is sent and the request method defines a meaning
for an enclosed payload body. For example, a Content-Length header
field is normally sent in a POST request even when the value is 0
(indicating an empty payload body).

that's irrelevant as well. this checks content-type. if the content-length is 0, then there's no content.

No content and empty content are two different things.

technically, yes, but for practical purposes, no. what's a case that you'll actually need "empty content" and why would a content-type for this "empty content" be actually useful?

We're also running into the same issue after updating to the latest express version

I assume you went from 3.x -> 4.x? That is bound to have breaking changes, of course (the reason for a major version number change in semver). The change talked about here was a deliberate change, not accidental.

@arthurschreiber thanks for hijacking this thread. Can you show a use-case where you would need to know the content-type of a zero-length body? Usually when people want to do that, they are using the incoming content-type to determine what they they should respond to the client in, which is invalid, because that's what the accept header is for.

@dougwilson Hey there, thanks for chiming in. I hope I did not come off as cocky in my previous posts (and if I did so, I'm terribly sorry).

I understand that moving from 3.x to 4.x involves breaking changes, but I'd hope those would be in the API, and not in basic handling of HTTP requests/responses.

Usually when people want to do that, they are using the incoming content-type to determine what they they should respond to the client in, which is invalid, because that's what the accept header is for.

That's not what the spec says:

A request without any Accept header field implies that the user agent
will accept any media type in response.

There is no mention that using the Content-Type to sniff out what response type is expected is prohibited. I understand that this is probably not the right way to do, but my problem is that I can't influence what some clients of our application send (e.g. because they use some third party tools that are batshit crazy and don't follow "the right way").

From looking through the HTTP RFCs, I don't see any mentions that the behaviour of express 3.x was wrong or broken. It definitely was not encouraged, but also not discouraged, and it's what other frameworks like rails do.

I understand if you feel like this discussion stopped being useful, and I'll back off if that's the case. @jonathanong mentioned a workaround I could use in our app, so I'm ok on that front. Still, I wanted to understand where this sudden change was/is coming from, because I can't really find anything in the RFCs that would warrant it. 😄

Hey there, thanks for chiming in. I hope I did not come off as cocky in my previous posts (and if I did so, I'm terribly sorry).

No, it was mostly from the old thread resurrection, rather than creating a new issue (and perhaps linking to an old thread). It just makes looking back at old threads harder to follow.

There is no mention that using the Content-Type to sniff out what response type is expected is prohibited. I understand that this is probably not the right way to do, but my problem is that I can't influence what some clients of our application send (e.g. because they use some third party tools that are batshit crazy and don't follow "the right way").

Correct, if no accept header is provided, you are technically free to sniff whatever header you want. In that cause, you may want to add Vary: Content-Type in your response headers for caching proxies to not get confused.

From looking through the HTTP RFCs, I don't see any mentions that the behaviour of express 3.x was wrong or broken.

Correct, though RFC 2616 (which was what existed when the change was made) had a slightly stronger opinion that the newer HTTP/1.1 RFCs do.

Specifically, the change was made so people could do req.is and then just parse the body. Most content-types are invalid when they have zero-length (think images, JSON, pretty much all binary formats, etc.). The express 4.x req.is maps to this library's typeofrequest which determines the type of a request as a whole, rather than just sniffing the content-type. I'm not familiar with Rails, but it seems likely the API you are referring to is like is in this lib, rather than like typeofrequest.

Can you point me to the API in Rack/Rails that is different from this? So far I only found request.content_type, which in Node.js the equivalent of that is req.headers['content-type'], but I may easily be missing what you are talking about since I'm not familiar with Ruby things :)

a workaround I could use in our app, so I'm ok on that front

Yea, that work-around is 100% valid and we don't intend to change that function (which is to look only at content-type header).

I think you have the wrong library. I think you want https://github.com/expressjs/accepts

Maybe I'm missing something, but accepts is for parsing the Accept header, and won't help me in determining what content type I should respond with if no Accept header was provided in the first place.

Again, I understand this is a pretty special edge case, that 99.9% of the users won't care about, and I don't want to pester you too much about this. 😄

Our current code tries req.accepts, and then we try to fall back to req.is, as a last resort.

OK, thanks for the links to the Rails code :) So I see where the confusion is. Yes, the request.formats and request.accepts do indeed correspond to the accepts library (req.accepts in express). What it is is Rails goes through extra length to fallback to content-type sniffing. Basically sniffing the content-type when there is no accepts is actually a "feature" of Rails. In this case, if you wanted it built-in to express, it would actually be a feature request on the accepts library, which would in turn use is from this library :) I don't think that is a good thing to do, though, so I doubt we'll implement that weird fallback, though you are certainly free to do so.

Basically you just want the fallback behavior that Rails does in your express code, I certainly get it. I'd suggest adding this middleware in your express app to simplify your life:

var accepts = require('accepts')
var typeis = require('type-is')

app.use(function(req, res, next){
  // Rails-like accepts with content-type fallback
  req.accepts = railsAccepts
  next()
})

function railsAccepts(){
  var accept = accepts(this)
  var contentType = req.headers['content-type']

  return accept.types.apply(accept, arguments)
    || typeis.is.bind(null, contentType).apply(null, arguments)
}