URLs which don't support HEAD requests or don't return text/html on HEAD don't work
SuperTux88 opened this issue · comments
Is there a reason why this does a HEAD
request first (triggered by this html?
check), only to then do a GET
request if this was successful and check if this was html?
again? Wouldn't it be simpler (in the code, and also less requests needed) to do the GET
request from the beginning and just check once if it's html?
This is a problem if servers don't support HEAD
requests or don't contain all headers in the HEAD
request (which I don't know why servers do this, but this was the problem here).
The problem this avoids is that faraday has no way to avoid to eagerly reading the body when doing a request, so if we were to request a big file or worse an endless stream we would eventually OOM.
Oh, that problem ... I remember now when there were multiple diaspora-pods "listening" to a radio-stream somebody posted and then one after another disconnected after time because they were running OOM. I knew there was probably a reason which I didn't think about.
So that's not fixable without (re)introducing an even bigger problem unless we find something that can do GET
requests and cancel if something doesn't return html.
Yeah, finding that wouldn't be too hard actually, it's just that this library would also sacrifice the HTTP client independence that faraday provides :/