URLs which don't support HEAD requests or don't return text/html on HEAD don't work

Question

URLs which don't support HEAD requests or don't return text/html on HEAD don't work

SuperTux88 opened this issue 5 years ago · comments

Is there a reason why this does a HEAD request first (triggered by this html? check), only to then do a GET request if this was successful and check if this was html? again? Wouldn't it be simpler (in the code, and also less requests needed) to do the GET request from the beginning and just check once if it's html?

This is a problem if servers don't support HEAD requests or don't contain all headers in the HEAD request (which I don't know why servers do this, but this was the problem here).

Jonne Haß · Answer 1 · Tue Jan 08 2019 04:23:00 GMT+0800 (China Standard Time)

The problem this avoids is that faraday has no way to avoid to eagerly reading the body when doing a request, so if we were to request a big file or worse an endless stream we would eventually OOM.

Benjamin Neff · Answer 2 · Tue Jan 08 2019 04:43:10 GMT+0800 (China Standard Time)

Oh, that problem ... I remember now when there were multiple diaspora-pods "listening" to a radio-stream somebody posted and then one after another disconnected after time because they were running OOM. I knew there was probably a reason which I didn't think about.

So that's not fixable without (re)introducing an even bigger problem unless we find something that can do GET requests and cancel if something doesn't return html.

Jonne Haß · Answer 3 · Tue Jan 08 2019 04:53:44 GMT+0800 (China Standard Time)

Yeah, finding that wouldn't be too hard actually, it's just that this library would also sacrifice the HTTP client independence that faraday provides :/