adamalton / djangae-consistency

Mitigation against eventual consistency issues on the App Engine Datastore

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Djangae Consistency

Note: this project has now been merged into djangae as djangae.contrib.consistency. Pull requests should be submitted there.

A Django app which helps to mitigate against eventual consistency issues with the App Engine Datastore. Works only with Djangae.

In A Nutshell

It caches recently created and/or modified objects so that it knows about them even if they're not yet being returned by the Datastore. It then provides a function improve_queryset_consistency which you can use on a queryset so that it:

  • Uses the cache to include recently-created/-modified objects which match the query but which are not yet being returned by the Datastore. (This is not effective if the cache gets purged).
  • Does not return any recently deleted objects which are still being returned by the Datastore. (This is not cache-dependent.)

Usage

Add 'consistency' to settings.INSTALLED_APPS and then use as follows. (You may also need to import consistency.models to get the signals to register.)

from consistency import improve_queryset_consistency, get_recent_objects

# Example 1 - `improve_queryset_consistency`

queryset = MyModel.objects.filter(is_yellow=True)
more_consistent_queryset = improve_queryset_consistency(queryset)
# Use as normal


# Example 2 - `get_recent_objects`

queryset = MyModel.objects.filter(is_yellow=True)
objects_possibly_missed = get_recent_objects(queryset)
# Use both together to build your full set of results

Note that in both cases the recently-created objects are not guaranteed to be returned. The recently-created objects are only stored in memcache, which could be purged at any time.

Advanced Configuration

By default the app will cache recently-created objects for all models. But you can change this behaviour so that it only caches particular models, only caches objects that match particular criteria, and/or caches objects that were recently modified as well as recently created.

CONSISTENCY_CONFIG = {

    # These defaults apply to every model, unless otherwise overriden.
    # The values shown here are the default defaults (i.e. what you get if
    # you don't set CONSISTENCY_CONFIG at all).
    "defaults": {
        "cache_on_creation": True,
        "cache_on_modification": False,
        "cache_time": 60, # Seconds
        "caches": ["django"], # Where to store the cache.
        "only_cache_matching": [], # Optional filtering (see below)
    },

    # The settings can be overridden for each individual model
    "models": {
        "app_name.ModelName": {
            "cache_on_creation": True,
            "cache_on_modification": True,
            "caches": ["session", "django"],
            "cache_time": 20,
            "only_cache_matching": [
                # A list of checks, where each check is a dict of filter
                # kwargs or a function. If an object matches *any* of
                # these then it is cached. No filters means cache everything.
                {"name": "Ted", "archived": False},
                lambda obj: obj.method(),
            ]
        },
        "app_name.UnimportantModel": {
            "cache_on_creation": False,
            "cache_on_modification": False,
            # Any settings which you don't override inheit from "defaults".
        },
    },
}

Notes

  • Even if you set both cache_on_creation and cache_on_modification to False, you can still use improve_queryset_consistency to prevent stale objects from being returned by your query.
  • The way that improve_queryset_consistency works means that it converts your query into a pk__in query. This has 2 side effects:
    • It causes the initial query to be executed. It does this with values_list('pk') (which becomes a Datastore keys-only query) so is fast, but you should be aware that it hits the DB.
    • It introduces a limit of 1000 results, so if your queryset is not already limited then a limit will be imposed, and if your queryset already has a limit then it may be reduced. This is to ensure a total result of <= 1000 objects. This is imperfect though, and may result in slightly fewer than 1000 results because recent objects in the cache will reduce the limit even if they don't match the query. (This could potentially be fixed.)
  • To avoid the side effects of improve_queryset_consistency you may wish to use get_recent_objects instead, giving you slightly more control over what happens.
  • As you can see in the config section, there are 2 places which the new objects can be cached: in Django's normal cache (django.core.cache) or in the current user's session.
    • Using the "session" cache may be slightly faster for querying (as the session object has probably been loaded anyway, so it avoids another cache lookup), but it's unlikely to be faster when creating/modifying an object, because writing to the session requires a Database write, which is probably slower than a cache write. Unless you're altering the session object anyway, in which case the session cache may be advantageous.

About

Mitigation against eventual consistency issues on the App Engine Datastore

License:MIT License


Languages

Language:Python 100.0%