piercefreeman / grooveproxy

Groove, a crawling and unit test optimized MITM proxy server.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optional disk-based cache

piercefreeman opened this issue · comments

Implement disk based caching of the currently in-memory cache for subsequent runs.

Cache library

There are a few options by way of golang cache libraries. A non-extensive list:

  • diskv - primarily a disk based cache that supports serializing some fixed size of local entries into memory. Simple BE implementation that relies on converting structs to bytes. Allows setting a max size of in-memory cache.
  • go-cache memory based cache, comparable to memcached
  • gocache - general purpose cache that supports ristretto, redis, etc. as well as cache chaining so we can attempt one access before the others. Also implements a marshal that can serialize structs to bytes with the same API. ristretto supports the notion of maximum cost, which handles setting a maximum size of in-memory cache. There isn't a disk-based cache provided without bringing in a 3rd party dependency so we would have to provide our own implementation.

None of the these caching libraries appear to support setting a maximum size of the on-disk cache. We therefore have to implement our own. We move forward with diskv because it is the simplest of the above options and already has disk and memory handling baked in.

Maximum cache size / cache eviction

We want to avoid our in-memory and disk caches growing out of control. diskv handles a memory deallocation scheme arbitrarily - this range counter equates to the order of d.cache and as a standard golang map it has no guarantees on iteration order.

For simplicity we would like to use a LRU caching policy to evict objects that have gone the oldest without access. With a big enough cache ceiling this effectively means that more common objects will continually bubble to the top of the queue and older long-tail values will be bumped off the edge.

API Support

We want to support two APIs with regard to the cache. One should set the cache mode (like previously exists). This endpoint doesn't have to change.

We'll have to add one to invalidate the cache, useful in situations like unit testing where you don't want to keep around previous cache values. Ex. if you want to test different modes of the cache scheme.

POST /api/cache/delete