sdispater / cachy

Cachy provides a simple yet effective caching library.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cachy can't handly concurrent access, should use atomic cache creation

mjpieters opened this issue · comments

We use nitpick, project that lints linter and formatter configurations (a meta linter, if you will). This project in turn uses cachy to handle caching of remote style definitions.

Because the cache is shared among multiple CI jobs, we occasionally see issues with cache files not being found

  File "/.../lib/python3.7/site-packages/cachy/repository.py", line 47, in get
    val = self._store.get(key)
  File "/.../lib/python3.7/site-packages/cachy/stores/file_store.py", line 43, in get
    return self._get_payload(key).get('data')
  File "/.../lib/python3.7/site-packages/cachy/stores/file_store.py", line 62, in _get_payload
    with open(path, 'rb') as fh:
FileNotFoundError: [Errno 2] No such file or directory: '/.../.cache/nitpick/e1/dd/73/42/06/f6/97/29/e1dd734206f69729420b28a80c585edda296012ef36e3ac981749fa076c41975'

because another process has unlinked the file between the os.path.exists() test and the open() call, or the cache file being empty:

  File "/.../lib/python3.7/site-packages/cachy/repository.py", line 47, in get
    val = self._store.get(key)
  File "/.../lib/python3.7/site-packages/cachy/stores/file_store.py", line 43, in get
    return self._get_payload(key).get('data')
  File "/.../lib/python3.7/site-packages/cachy/stores/file_store.py", line 65, in _get_payload
    expire = int(contents[:10])
ValueError: invalid literal for int() with base 10: b''

contents is the data read from the cache file, and contents[:10] is an empty bytes string.

cachy should at the very least be resilient to the cache files missing or being empty here and not fail with an exception.

However, these errors only happen because another process is currently writing to the cache file (in cachy.stores.file_store.FileStore.put()). If cachy were to use a temporary file to write the cache to and then move that file into place, provided the temporary file and the cache store location were on the same filesystem, the operation would be atomic and therefore ._get_payload() would not have to deal with an empty file here.

It probably should use some kind of locking mechanism to coordinate between multiple processes trying to write to the cache.

The following script forces the issue by using multiple concurrent subprocesses to write to the cache:

#!/usr/bin/env python3
"""Cachy file store stress test

Usage: cachy_stresstest [process-count]

Starts process-count subprocesses (defaults to 5), all storing a random blob of
data on cache miss, all with the expiration set to 100 milliseconds. Prints out
worker names (capital ASCII letters), in green if a cached object was
successfully fetched or updated, red if an exception occurred. Processes exit
after 1 minute.

Exceptions are printed out as they happen.  Concurrent writing to the terminal
_may_ cause colours to be mixed up.

"""
import random
import string
import subprocess
import sys
import time
from tempfile import TemporaryDirectory
from typing import NoReturn

from cachy import CacheManager

size = 51200  # 5KB, larger than io.DEFAULT_BUFFER_SIZE
# Cachy accepts a value in minutes but doesn't prohibit float values
expiration = 0.5 / 60  # half a second


def worker(name: str, cache_dir: str) -> NoReturn:
    store = CacheManager(
        {"stores": {"file": {"driver": "file", "path": cache_dir}}}
    ).store()
    blob = bytes(random.choices(range(256), k=size))
    key = "cache_key"
    while True:
        cached, status = None, 92  # bright green
        try:
            cached = store.get(key)
        except Exception as e:
            status = 91  # bright red
            print(f"\n\x1b[1m{name}: {e.__class__.__name__}: {e}\x1b[m", flush=True)
        if cached is None:
            store.put(key, blob, expiration)
        print(f"\x1b[{status}m{name}\x1b[m", end="", flush=True)
        time.sleep(random.uniform(0.01, 0.25))


def main(count):
    print(f"Starting {count} processes to stress-test cache access")
    with TemporaryDirectory() as tempdir:
        procs = [
            subprocess.Popen([sys.executable, __file__, "worker", name, tempdir])
            for name in string.ascii_uppercase[:count]
        ]
        try:
            time.sleep(60)
        except KeyboardInterrupt:
            pass
        for proc in procs:
            proc.kill()


if __name__ == "__main__":
    if len(sys.argv) > 3 and sys.argv[1] == "worker":
        worker(*sys.argv[2:4])
    main(int(sys.argv[1]) if len(sys.argv) > 1 else 5)

It'll hit both issues quite quickly, more so if you use more than the default 5 processes.