Named tasks can not be recreated after they complete
shaunco opened this issue · comments
I had left this comment in #106, but figured it was worth asking more directly...
I find it very strange that named tasks will always Exists once added to a queue when it seems like they should stop existing once successfully executed... but Storage only has a single Exists method, which means Consumer has no way to remove a successfully executed task from storage.
RedisStorage holds these for 24hr and LocalStorage will hold them until 128,000 additional tasks have been queued.
An example:
- Create a task and give it a name like
spider_www_google_com
- Attempt to add a second task with the name
spider_www_google_com
- taskq rightfully returnstaskq.ErrDuplicate
spider_www_google_com
task completes successfully and isconsumer.Put
is called which callsconsumer.delete
, which callsQueue.Delete
... this callsRedis.XDel
on the ID for Redis andscheduler.Remove
for memqueue- Attempt to add a new task with the name
spider_www_google_com
- taskq returnstaskq.ErrDuplicate
I would expect step 4 to succeed, but it fails because RedisStorage.Exists
calls SetNX
on the task's name whereas the call to XDel
is on the task's ID ... which leaves the name lock lingering for 24hr. Similarly, LocalStorage.Exists
puts the name in a LRU cache (of size 128,000) whereas scheduler.Remove
is a map based with the task's pointer as its key.
The easy fix is to extend the Storage
interface to have a Delete
method that can be called when a task is completed successfully... however, for Redis it is probably best to simply pass both the task's ID and name to XDel
to avoid needing a second round-trip to Redis and then leaving RedisStorage.Delete
to do nothing.
Thoughts? Am I missing something here?
The idea is to be able to execute certain tasks once a day / period. So we chose a task name like spider_www_google_com_20210101
and use Redis to deduplicate tasks by name. Looks like your expectations / requirements are a bit different.
Now that you say it, I realize I completely overlooked https://github.com/vmihailenco/taskq#message-deduplication and how OnceInPeriod
, OnceWithDelay
, and OnceWithSchedule
craft message names around that behavior.
What are your thoughts on something like:
task.WithArgs(ctx, "my input", 42).OnceUntilComplete()
in order to trigger cleanup of Redis/LocalStorage dedupe cache on completion of the task
That is possible, but I can't describe/understand what guarantees such logic provides...
I think the confusion on my side is that currently setting a name really acts as though I called a OncePerDay
method, when I really just expect it to be "once in the queue at a time for the given name".
The proposed OnceUntilComplete()
would just guarantee that any attempts to add the same named task to the queue prior to completion of the existing task would return taskq.ErrDuplicate
.
Continuing with the spider example, imagine there is a website where people can request to have their site indexed. Sites are indexed in FIFO order. Users are free to request their site be re-indexed at any time, but to save memory/queue length I only want each requested site in the queue once (yes, this memory constraint is contrived - in my real use case taskq is being used on an IoT device that is actually memory constrained). So, if someone requests "www.google.com" to be indexed, and there is already a pending task for "www.google.com", the site would just return "your site will be indexed soon". If there was not a pending task for "www.google.com", a new task would be added to the end of the queue.
The alternative to this is that I have to maintain a totally separate list of what is currently in the queue to ensure I don't add duplicates to the queue. Given the Redis and LRU maps that already exist in taskq, extending Message
to have a CleanupOnceOnComplete
flag and extending Storage
to have a Delete
methods seems to be the most eloquent solution.
Happy to submit a PR for this, just want to make sure there is a chance you'll accept it before I do the work.