jhass / open_graph_reader

A library to fetch and parse OpenGraph properties from an URL or a given string.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

URLs which don't support HEAD requests or don't return text/html on HEAD don't work

SuperTux88 opened this issue · comments

Is there a reason why this does a HEAD request first (triggered by this html? check), only to then do a GET request if this was successful and check if this was html? again? Wouldn't it be simpler (in the code, and also less requests needed) to do the GET request from the beginning and just check once if it's html?

This is a problem if servers don't support HEAD requests or don't contain all headers in the HEAD request (which I don't know why servers do this, but this was the problem here).

The problem this avoids is that faraday has no way to avoid to eagerly reading the body when doing a request, so if we were to request a big file or worse an endless stream we would eventually OOM.

Oh, that problem ... I remember now when there were multiple diaspora-pods "listening" to a radio-stream somebody posted and then one after another disconnected after time because they were running OOM. I knew there was probably a reason which I didn't think about.

So that's not fixable without (re)introducing an even bigger problem unless we find something that can do GET requests and cancel if something doesn't return html.

Yeah, finding that wouldn't be too hard actually, it's just that this library would also sacrifice the HTTP client independence that faraday provides :/