oduwsdl / hypercane

A toolkit for developing algorithms that sample mementos from a web archive collection.

Home Page:https://oduwsdl.github.io/hypercane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add the ability to only use the cache

shawnmjones opened this issue · comments

Some users may have built a cache from prior runs and not want to issue new HTTP requests to add to it. We may not be able to force non-network access with caches supplied via environment variables like HTTPS_PROXY, but the faster MongoDB cache used by requests-cache can be overridden to not issue a network connection for cache misses.

Create a new object named OnlyCachedSession that is a child of CachedSession. This object will skip the network connections provided by requests altogether.

Some code below that has worked in testing:

from requests.hooks import dispatch_hook
from requests_cache import CachedSession

class FailedCacheResponse(Exception):
    pass

class OnlyCachedSession(CachedSession):

    def send(self, request, **kwargs):

        cache_key = self.cache.create_key(request)

        def send_request_and_cache_response():
            response = super(CachedSession, self).send(request, **kwargs)
            if response.status_code in self._cache_allowable_codes:
                self.cache.save_response(cache_key, response)
            response.from_cache = False
            return response

        try:
            response, timestamp = self.cache.get_response_and_time(cache_key)
        except (ImportError, TypeError):
            raise FailedCacheResponse(
                "Import/Type Errors : could not get response and time : item {} is not in the cache".format(cache_key)
            )

        if response is None:
            raise FailedCacheResponse(
                "response is None : could not get response and time : item {} is not in the cache".format(cache_key)
            )

        # dispatch hook here, because we've removed it before pickling
        response.from_cache = True
        response = dispatch_hook('response', request.hooks, response, **kwargs)
        return response

With the work on #65, this may become even easier to implement.