Abandoned WaitingRoom Requests will affect MaxSizeInlet strategy

Question

Abandoned WaitingRoom Requests will affect MaxSizeInlet strategy

andrzejbe opened this issue 2 years ago · comments

Describe the bug
When adding new requests to WaitingRoom the assumption is that each one of the queued requests will eventually get a token/session.

Based on this assumption, later on serving_num gets updated based on eg. expired tokens/session - but NOT if a user abandons the waiting room before they're granted a token/session.

In such case, the browser would stop polling serving_num leaving corresponding RequestId entry in Redis. In an extreme example - specifically using MaxSizeInlet strategy - if number of people who abandons WaitingRoom is greater than number of people who successfully left the site after being granted the token (eg. after successful checkout or token expiry) then it may lead to whole queue getting "stuck".

To Reproduce

set MaxSize to 10
add 30 requests to the queue
the first 10 requests will be "let in" and given a token/session
immediately afterwards, abandon next 10 sessions in the waiting room - before token is given to any of then
expire the 10 "active" sessions
serving_num should increase by 10 - but users index 10-20 are no longer polling (!)
waiting room is stuck

Note that it only affects MaxSizeInlet strategy

Expected behavior
Some way (?) :) to detect when users waiting in the queue abandoned the request before being granted a token/session

Please complete the following information about the solution:

v 1.0

Jim Thario · Answer 1 · Thu Apr 28 2022 00:21:06 GMT+0800 (China Standard Time)

Hi, thanks for documenting this issue. We'll work on a solution for our next release and discuss here to get your thoughts on approach.

Andrzej · Answer 2 · Thu Apr 28 2022 00:24:23 GMT+0800 (China Standard Time)

OK thank you

my initial idea on possible, suggested solution is for Redis to send pubsub notifications via SNS on key expiry - which would trigger Lambda (if thats possible of course - will check in coming days)

Andrzej · Answer 3 · Fri May 06 2022 20:02:01 GMT+0800 (China Standard Time)

Just so you know - we've tested my idea (above) and its working quite nicely - so its certainly one possible way of solving the issue. We have a python listener script that triggers serving_num change in response to redis pubsub notification when request id key-value entry expire in redis (user abandons waiting room without getting a session token)

Jim Thario · Answer 4 · Sat May 07 2022 03:19:47 GMT+0800 (China Standard Time)

If I am understanding the fix, do you set an expiry time that starts when the request ID becomes eligible for a token, and send the notification when that time passes?

Andrzej · Answer 5 · Wed May 11 2022 17:39:36 GMT+0800 (China Standard Time)

Yes, the original idea was to set expiry time when saving request ID into redis - and then periodically extend it when user is polling the service (while waiting in the queue). If the user would abandon the queue (before they're let in / obtain a token), redis entry wouldn’t get extended because no polling would occur - and would eventually expire. We’d then use redis pubsub notifications and listen for expiration event for any such key and could use that information somehow.

However, upon further investigation we figured that just knowing eg. How many people abandoned the queue isn’t really helpful as we can’t just eg. Increase the serving_num - it would have other undesirable side-effects…

For this reason we’ve now decided we might want to build a fully custom solution based on redis Sorted Sets instead of just counters (like serving_num)

Queue-Fair · Answer 6 · Fri Jul 15 2022 23:00:13 GMT+0800 (China Standard Time)

This is a much tougher problem that it might appear at first glance - and there are other related ones that it doesn't look like you have encountered yet - see for example Why Accuracy Matters for Virtual Waiting Rooms and Online Queues .

It's far harder than it looks - you're going to need a sophisticated AI if you want to solve it exactly, like we did.

TBH rather than the extensive rebuild your system would require to solve this, it would be much more cost effective for Amazon to use or buy Queue-Fair. Having invented and patented the original rate-based Virtual Waiting Room for busy websites way back in 2004, we already solved this problem perfectly - as well as the others you are yet to face - and our system is far cheaper and more efficient to run for your cloud users to boot. Just saying!

Hope this is helpful!

Jim Thario · Answer 7 · Sat Jul 16 2022 00:35:25 GMT+0800 (China Standard Time)

Thanks for the feedback. We're taking a different approach.
We publish sample customer operational costs in the implementation guide here:
https://docs.aws.amazon.com/solutions/latest/virtual-waiting-room-on-aws/cost.html
Can you provide a link to your cost model? I probably wasn't looking in the right place.
Best of luck!

Queue-Fair · Answer 8 · Sat Jul 16 2022 00:59:11 GMT+0800 (China Standard Time)

Hi Jim, we don't show pricing on our website after AB testing revealed that it didn't help with conversions, even though we're by far the cheapest provider on the market, but we'd love to have a commercial conversation with you! Please email sales AT queue-fair DOT com at your convenience and we'll be happy to hear from you.

Thanks for the link to your costs table. I'm not going to go into detail on this public forum, but I can tell you our cost model is much cheaper. Much much cheaper. Like less than 1% of your costs.

Have a lovely weekend!