Question: why set TTL to 10 mins AFTER scheduled time?

Question

Question: why set TTL to 10 mins AFTER scheduled time?

theburningmonk opened this issue 4 years ago · comments

I'm looking at this line of code and it's not making sense to me. Why set TTL to 10 mins AFTER the scheduled time? Considering that DynamoDB TTL doesn't delete item right away (when it's not a busy table, this can be anything between 10-30 mins after an item is expired) so why would you add even more delay to that process?

Michael Bahr · Answer 1 · Mon Jul 06 2020 16:50:42 GMT+0800 (China Standard Time)

tl;dr: There used to be an update after TTL was reached.

There was some logic in the emitter which changed an item's status once they were emitted. I wanted to make sure that the ddb records are still available when being emitted. The docs said that it might take longer (and all the tests confirmed that), but I didn't want to risk an error when updating an item that was already deleted.

As I've evolved the code, I moved away from updating the long term storage table towards using logs for analysis/recovery. Right now I don't see anymore reason, why the +10m on TTL would be required.

Yan Cui · Answer 2 · Mon Jul 06 2020 17:14:18 GMT+0800 (China Standard Time)

ah, so this is not the scheduled task itself, but another "update" task?

Michael Bahr · Answer 3 · Tue Jul 07 2020 04:13:46 GMT+0800 (China Standard Time)

There was an update in an older version, but just a simple database write. Not a task like the scheduled events.

With the latest version you should be able to remove the +10m on the TTL.

Michael Bahr · Answer 4 · Tue Jul 07 2020 04:14:43 GMT+0800 (China Standard Time)

Also I haven't heard of anyone using this in production (apart from my own hobby projects). If you're seriously considering using this code, please let me know :)

Michael Wills · Answer 5 · Tue Jul 07 2020 04:55:58 GMT+0800 (China Standard Time)

I was thinking about it for a side project. I was actually going to ping @theburningmonk about his thoughts on it, too, as the AWS landscape changes. 😁

Michael Bahr · Answer 6 · Tue Jul 07 2020 15:56:15 GMT+0800 (China Standard Time)

To be honest I suggest against using this solution, if you don't really need it. Here a couple simpler solutions (far less moving parts):

If you can tolerate delays of 30-60 minutes: DynamoDB TTL
If your tasks wait up to 15 minutes: SQS with DelaySeconds
If your tasks wait up to 1 year: Step Functions

Only if all those don't make sense, then try to run this application :)

Michael Wills · Answer 7 · Fri Jul 10 2020 03:15:15 GMT+0800 (China Standard Time)

@bahrmichael I do like the potential of DynamoDB TTL + step functions but curious what the reason would be to hold off on using this implementation. [edit]Other than the moving parts that is...[/edit]

Michael Bahr · Answer 8 · Fri Jul 10 2020 21:37:21 GMT+0800 (China Standard Time)

My reason is that step functions has far more engineers supporting it and keeping it stable, than this solution has :)

Michael Wills · Answer 9 · Sat Jul 11 2020 14:10:19 GMT+0800 (China Standard Time)

tl;dr: 🤔 there is that... 😄 point noted.

This repo is well thought out with metrics, exception handling, throttling, pagination, etc. Should the backing services change, yes that could be an issue. As your code is quite clear, it would presumably not be a huge issue. I'll go convince myself to do it the ☝️ other ways first. Thanks again for sharing the code and the writeup!

Michael Wills · Answer 10 · Mon Jul 13 2020 07:49:58 GMT+0800 (China Standard Time)

It's actually a challenge to find a nice clean way to do this with stock AWS components. What I will try is

create an s3 object with the payload
listen to s3 object create
examine expiration
if < 15 min publish to SQS
If > 15 min and < 48h publish to a step function (due to maximum actual delete time)
if > 48h write to DynamoDB with TTL to pick up actual delete with a step function

The hope is to

have an automatic simple backup with a sort of audit trail - no data lost and (sort of) queryable with an s3 list
reduce request/retry/throttling logic
use s3 writes to avoid DynamoDB write throttling (automatic retries at 3500 writes per sec per bucket prefix!)
avoid querying for expiring objects
keep precision

Anything simpler that can keep all the gains of the various methods?

Michael Bahr · Answer 11 · Fri Jul 17 2020 13:52:24 GMT+0800 (China Standard Time)

I like that. One of my challenges was loading data from DDB in very spiky situations. A cron job that runs every minute can only do so much.

Another idea: S3 has expiration dates as well, which are processed once a day according to this blog post. But imho DDB with TTL is good enough.

Michael Wills · Answer 12 · Sat Jul 18 2020 12:41:11 GMT+0800 (China Standard Time)

Ooooh they bumped the rule limit to 1000. I missed that bit. Just have to stash things with the right prefix.

I'm essentially seeing if I can avoid the query/paginate/throttling aspects and let the built-in systems do what they do best. I can program try/catch style for unexpected cases... like the time the cleanup got backlogged...