psf / cachecontrol

The httplib2 caching algorithms packaged up for use with requests.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

download of partial content is defeated by cachecontrol

dimbleby opened this issue · comments

suppose some large file is in the cache already and we make a request for partial content with a Range header.

understanding that caching of such things is likely to go wrong we also include the header Cache-Control: no-cache.

this header succeeds in preventing cachecontrol from responding immediately from the cache - but does not stop it from going through

request.headers.update(self.controller.conditional_headers(request))
in which it unconditionally looks in the cache
resp = self._load_from_cache(request)
. The result is that cachecontrol adds an If-Modified-Since header to the request, with timestamp per what is in the cache.

but now the server sees the header on that request, knows that the file has not been modified recently, and returns 304. And now cachecontrol returns the entire file from its cache anyway, and not the partial content that the server would have given.

related-ish: #246 asks for cachecontrol to cache partial content. But this one reports that cachecontrol makes partial content not work

A couple of plausible fixes come to mind

  • I guess relatively easy is: don't add the conditional headers if the request has a Range header on it
  • I guess relatively hard is: if cachecontrol already has the entire file in cache then it should be able to provide partial content from that cache
    • (if this worked then we wouldn't add the Cache-Control: no-cache header to the request)

Thanks for root causing this @dimbleby!

I haven't fully thought this through yet, but your first fix seems reasonable to me (and easy). I likely won't have time to do a patch for this in the immediate future, but we'd be happy to review one.

(Medium term I will probably have the time -- realistically, around March.)