grantjenks / python-diskcache

Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.

Home Page:http://www.grantjenks.com/docs/diskcache/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deque `peekleft` blocks infinitely after corruption(?)

i404788 opened this issue · comments

commented

I ran into this issue today, unsure of how it's caused but len(q)==1, list(q)==[] and q.peekleft() blocks infinitely (as opposed to the IndexError it's supposed to raise).

I've attached both the resolved db and the wal/shm which were there before analysis. Both versions cause the same issue.
wal-resolved.zip
wal-unresolved.zip

It seems like it loops inside _cache.peek infinitely.

Can you provide code to reproduce the problem? I’m not comfortable downloading your database files.

Looking at the code for peek() in core.py, seems like maybe the file referenced by the cache disappeared. So the cache returns a reference to a file to be read but the read fails and so it attempts the read again.

commented

I have no code to create the corrupted database, since it's on a running application, but effectively the following happens:

from uuid import uuid4
from pydantic import BaseModel
from datetime import datetime

class SomeData(BaseModel):
    attr: datetimel

def occasional_call():
    Deque('./somedir').append((SomeData(attr=datetime.now()), uuid4()))

async def continuous_process():
   q = Deque('./somedir')
   while True:
     try:
        data, id = q.peekleft()
        if not data.attr > datetime.now():
            data.rotate(-1)
            await asynio.sleep(1.)
            continue
        q.popleft()
     except IndexError:
        await asyncio.sleep(1.)

The code which then causes the freeze is just the

async def continuous_process():
   q = Deque('./somedir')
   try:
      data, id = q.peekleft() # freezes here
commented

Looking at the code for peek() in core.py, seems like maybe the file referenced by the cache disappeared. So the cache returns a reference to a file to be read but the read fails and so it attempts the read again.

That could be correct, I do see the following in one of my repl tries:

>>> from diskcache import Deque
>>> x = Deque(directory='./')
>>> x.peekleft()
^CTraceback (most recent call last):
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
    value = self._disk.fetch(mode, name, db_value, False)
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 281, in fetch
    with open(op.join(self._directory, filename), 'rb') as reader:
FileNotFoundError: [Errno 2] No such file or directory: './e5/ce/06697b28165690cebf36d11ec1d9.val'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/persistent.py", line 478, in peekleft
    _, value = self._cache.peek(default=default, side='front', retry=True)
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
    value = self._disk.fetch(mode, name, db_value, False)
KeyboardInterrupt

Move the Deque reference to a global variable. Could be the constant reinitialization corrupts the database.

commented

Alright, I'll try that as well

commented

Still happens quite often even though I ensured they are only initialized once (added prints to make sure), may it has to do with the rotate(-1)? the rest of the methods seem like they wouldn't cause any conflict.

commented

While testing this I discovered another issue, in the same code as above certain items that are appended are never actually end up being peekleft even though it's the only place it's used. Seems like Deque might just not be async safe like the caches?

I don’t think there’s an asyncio-specific issue.

How big are the values? And how many concurrent continuous processes are running?

I wonder if one process writes the database entry and then, before it can write the value to a file, another process pops the database entry.

commented

The values are ~3kb and there is only a single continuous running. It's a multi-producer single-consumer architecture.
So occasional call could be multiple at the same time (in theory we actually only use one worker), and continuous as always one asyncio task.

If the values are only 3k then I don’t see why it would use a file to store it. The disk Mon file size is 32k.

Are you able to create a reduced repro of the problem? I’d like to see it fail locally and without that I’m only guessing.

commented

Hmm the contents shouldn't be that large, but I'll take a look at a real example in a bit.
It's very reproducible in the application (typically when it has at least one value which it needs to rotate), but I haven't been able to isolate it yet, I'll make an attempt at that tomorrow.

commented

Alright it seems to be a bit larger than I thought it's max 32400 bytes (pickle.dumps), I can reduce it to ~16k since there is redundancy in there but it does make files then. I'll keep that in mind when attempting to isolate it.

commented

I was able to create a quick reproduction for the dropping records issue:

from diskcache import Deque
import asyncio
from datetime import datetime, timedelta
from pydantic import BaseModel
from secrets import token_bytes
from uuid import uuid4


class Data(BaseModel):
    resolve_after: datetime
    blob: bytes


async def producer():
    q = Deque(directory='./store')
    while True:
        if len(q) < 10:
            blob = token_bytes(33000)
            print(f'appending {len(q)}')
            q.append((Data(resolve_after=datetime.now() +
                     timedelta(seconds=10), blob=blob), uuid4()))

        await asyncio.sleep(0.1)


async def consumer():
    q = Deque(directory='./store')
    while True:
        try:
            data: Data
            data, id = q.peekleft()
            if data.resolve_after > datetime.now():
                print(f'rotating.. {len(q)}')
                q.rotate(-1)
                await asyncio.sleep(1.)
                continue
            print('poping')
            q.popleft()
        except IndexError:
            print('empty queue')
            await asyncio.sleep(5.)
        await asyncio.sleep(0.1)

if __name__ == '__main__':
    async def run():
        task = asyncio.create_task(producer())
        await consumer()

    asyncio.run(run())

Logs (after filling the queue):

rotating.. 9
appending 8
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9

You can see it likely drops the record when rotating, if it happened as rot[pop] -> append -> rot[push] then it should give length 11 at some point so it does seem to drop a record on each rotate. Haven't seen the freeze yet (it might need a nearly empty queue with rotate).

Interestingly with token_bytes(330) (or any small size) it behaves as expected:

rotating.. 10
rotating.. 10
rotating.. 10
rotating.. 10
rotating.. 10
rotating.. 10
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
rotating.. 10
rotating.. 10
rotating.. 10
commented

My guess was correct changing len(q) < 1 will cause the freeze bug, again this only works with token_bytes(33000) such that it creates a file, here is a log:

poping
empty queue
appending 0
rotating.. 1
^CTraceback (most recent call last):
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
    value = self._disk.fetch(mode, name, db_value, False)
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 281, in fetch
    with open(op.join(self._directory, filename), 'rb') as reader:
FileNotFoundError: [Errno 2] No such file or directory: './store/f1/9f/3c37a616004a856af46a15ce208c.val'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/sata0/diskcache-repro/./repro.py", line 49, in <module>
    asyncio.run(run())
  File "/home/null/mambaforge/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/null/mambaforge/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/home/null/mambaforge/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/home/null/mambaforge/lib/python3.10/asyncio/base_events.py", line 1906, in _run_once
    handle._run()
  File "/home/null/mambaforge/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/media/sata0/diskcache-repro/./repro.py", line 47, in run
    await consumer()
  File "/media/sata0/diskcache-repro/./repro.py", line 31, in consumer
    data, id = q.peekleft()
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/persistent.py", line 478, in peekleft
    _, value = self._cache.peek(default=default, side='front', retry=True)
  File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
    value = self._disk.fetch(mode, name, db_value, False)
KeyboardInterrupt

You found a bug! PR in #288

Released as version 5.6.3 to PyPI.