Use case? Persistent concurrent set.

Question

Use case? Persistent concurrent set.

kmatt opened this issue 2 years ago · comments

Need to keep track of a list of files processed by a multiprocessed application. A Redis SET would work, but I would prefer not to manage a separate process daemon. Redislite has been a bit problematic leaving orphaned processes on app termination.

This module may be a solution when using a Cache with no expirations or evictions? I need the set to be kept permanently.

One thing not clear is if an Index would be better suited. A dictionary with an unused value (1) could emulate a set. My current Cache POC uses incr() to track when files have been queued multiple times as a possible process logic error.

Grant Jenks · Answer 1 · Mon Aug 29 2022 07:18:51 GMT+0800 (China Standard Time)

This module may be a solution when using a Cache with no expirations or evictions? I need the set to be kept permanently.

Sure, that's reasonable.

One thing not clear is if an Index would be better suited.

Seems better.

A dictionary with an unused key (1) could emulate a set.

I think you mean an unused value.

See also: https://grantjenks.com/docs/diskcache/case-study-web-crawler.html

Matt Keranen · Answer 2 · Mon Aug 29 2022 23:15:15 GMT+0800 (China Standard Time)

I think you mean an unused value.

Correct, updated question. Web crawler case study is instructive, thanks!

Is there documentation that describes when an Index() is not a good option? Or as an extension of Cache() is it suitable in all equivalent cases?

https://grantjenks.com/docs/diskcache/tutorial.html#index

Grant Jenks · Answer 3 · Mon Aug 29 2022 23:22:15 GMT+0800 (China Standard Time)

Index() is simply using Cache() under the hood. Index() follows the Mapping API in Python.