Deque `peekleft` blocks infinitely after corruption(?)
i404788 opened this issue · comments
I ran into this issue today, unsure of how it's caused but len(q)==1
, list(q)==[]
and q.peekleft()
blocks infinitely (as opposed to the IndexError
it's supposed to raise).
I've attached both the resolved db and the wal/shm
which were there before analysis. Both versions cause the same issue.
wal-resolved.zip
wal-unresolved.zip
It seems like it loops inside _cache.peek
infinitely.
Can you provide code to reproduce the problem? I’m not comfortable downloading your database files.
Looking at the code for peek() in core.py, seems like maybe the file referenced by the cache disappeared. So the cache returns a reference to a file to be read but the read fails and so it attempts the read again.
I have no code to create the corrupted database, since it's on a running application, but effectively the following happens:
from uuid import uuid4
from pydantic import BaseModel
from datetime import datetime
class SomeData(BaseModel):
attr: datetimel
def occasional_call():
Deque('./somedir').append((SomeData(attr=datetime.now()), uuid4()))
async def continuous_process():
q = Deque('./somedir')
while True:
try:
data, id = q.peekleft()
if not data.attr > datetime.now():
data.rotate(-1)
await asynio.sleep(1.)
continue
q.popleft()
except IndexError:
await asyncio.sleep(1.)
The code which then causes the freeze is just the
async def continuous_process():
q = Deque('./somedir')
try:
data, id = q.peekleft() # freezes here
Looking at the code for peek() in core.py, seems like maybe the file referenced by the cache disappeared. So the cache returns a reference to a file to be read but the read fails and so it attempts the read again.
That could be correct, I do see the following in one of my repl tries:
>>> from diskcache import Deque
>>> x = Deque(directory='./')
>>> x.peekleft()
^CTraceback (most recent call last):
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
value = self._disk.fetch(mode, name, db_value, False)
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 281, in fetch
with open(op.join(self._directory, filename), 'rb') as reader:
FileNotFoundError: [Errno 2] No such file or directory: './e5/ce/06697b28165690cebf36d11ec1d9.val'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/persistent.py", line 478, in peekleft
_, value = self._cache.peek(default=default, side='front', retry=True)
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
value = self._disk.fetch(mode, name, db_value, False)
KeyboardInterrupt
Move the Deque reference to a global variable. Could be the constant reinitialization corrupts the database.
Alright, I'll try that as well
Still happens quite often even though I ensured they are only initialized once (added print
s to make sure), may it has to do with the rotate(-1)
? the rest of the methods seem like they wouldn't cause any conflict.
While testing this I discovered another issue, in the same code as above certain items that are append
ed are never actually end up being peekleft
even though it's the only place it's used. Seems like Deque
might just not be async safe like the caches?
I don’t think there’s an asyncio-specific issue.
How big are the values? And how many concurrent continuous processes are running?
I wonder if one process writes the database entry and then, before it can write the value to a file, another process pops the database entry.
The values are ~3kb and there is only a single continuous running. It's a multi-producer single-consumer architecture.
So occasional call could be multiple at the same time (in theory we actually only use one worker), and continuous as always one asyncio task.
If the values are only 3k then I don’t see why it would use a file to store it. The disk Mon file size is 32k.
Are you able to create a reduced repro of the problem? I’d like to see it fail locally and without that I’m only guessing.
Hmm the contents shouldn't be that large, but I'll take a look at a real example in a bit.
It's very reproducible in the application (typically when it has at least one value which it needs to rotate), but I haven't been able to isolate it yet, I'll make an attempt at that tomorrow.
Alright it seems to be a bit larger than I thought it's max 32400
bytes (pickle.dumps
), I can reduce it to ~16k since there is redundancy in there but it does make files then. I'll keep that in mind when attempting to isolate it.
I was able to create a quick reproduction for the dropping records issue:
from diskcache import Deque
import asyncio
from datetime import datetime, timedelta
from pydantic import BaseModel
from secrets import token_bytes
from uuid import uuid4
class Data(BaseModel):
resolve_after: datetime
blob: bytes
async def producer():
q = Deque(directory='./store')
while True:
if len(q) < 10:
blob = token_bytes(33000)
print(f'appending {len(q)}')
q.append((Data(resolve_after=datetime.now() +
timedelta(seconds=10), blob=blob), uuid4()))
await asyncio.sleep(0.1)
async def consumer():
q = Deque(directory='./store')
while True:
try:
data: Data
data, id = q.peekleft()
if data.resolve_after > datetime.now():
print(f'rotating.. {len(q)}')
q.rotate(-1)
await asyncio.sleep(1.)
continue
print('poping')
q.popleft()
except IndexError:
print('empty queue')
await asyncio.sleep(5.)
await asyncio.sleep(0.1)
if __name__ == '__main__':
async def run():
task = asyncio.create_task(producer())
await consumer()
asyncio.run(run())
Logs (after filling the queue):
rotating.. 9
appending 8
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
rotating.. 10
appending 9
You can see it likely drops the record when rotating, if it happened as rot[pop] -> append -> rot[push]
then it should give length 11 at some point so it does seem to drop a record on each rotate. Haven't seen the freeze yet (it might need a nearly empty queue with rotate).
Interestingly with token_bytes(330)
(or any small size) it behaves as expected:
rotating.. 10
rotating.. 10
rotating.. 10
rotating.. 10
rotating.. 10
rotating.. 10
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
poping
appending 9
rotating.. 10
rotating.. 10
rotating.. 10
My guess was correct changing len(q) < 1
will cause the freeze bug, again this only works with token_bytes(33000)
such that it creates a file, here is a log:
poping
empty queue
appending 0
rotating.. 1
^CTraceback (most recent call last):
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
value = self._disk.fetch(mode, name, db_value, False)
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 281, in fetch
with open(op.join(self._directory, filename), 'rb') as reader:
FileNotFoundError: [Errno 2] No such file or directory: './store/f1/9f/3c37a616004a856af46a15ce208c.val'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/sata0/diskcache-repro/./repro.py", line 49, in <module>
asyncio.run(run())
File "/home/null/mambaforge/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/null/mambaforge/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/home/null/mambaforge/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/home/null/mambaforge/lib/python3.10/asyncio/base_events.py", line 1906, in _run_once
handle._run()
File "/home/null/mambaforge/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/sata0/diskcache-repro/./repro.py", line 47, in run
await consumer()
File "/media/sata0/diskcache-repro/./repro.py", line 31, in consumer
data, id = q.peekleft()
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/persistent.py", line 478, in peekleft
_, value = self._cache.peek(default=default, side='front', retry=True)
File "/home/null/mambaforge/lib/python3.10/site-packages/diskcache/core.py", line 1702, in peek
value = self._disk.fetch(mode, name, db_value, False)
KeyboardInterrupt
You found a bug! PR in #288
Released as version 5.6.3 to PyPI.