aws-solutions / virtual-waiting-room-on-aws

Virtual Waiting Room on AWS solution helps absorb and control incoming user requests to your website during an unusually large burst of traffic, usually due to a large-scale event.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Abandoned WaitingRoom Requests will affect MaxSizeInlet strategy

andrzejbe opened this issue · comments

Describe the bug
When adding new requests to WaitingRoom the assumption is that each one of the queued requests will eventually get a token/session.

Based on this assumption, later on serving_num gets updated based on eg. expired tokens/session - but NOT if a user abandons the waiting room before they're granted a token/session.

In such case, the browser would stop polling serving_num leaving corresponding RequestId entry in Redis. In an extreme example - specifically using MaxSizeInlet strategy - if number of people who abandons WaitingRoom is greater than number of people who successfully left the site after being granted the token (eg. after successful checkout or token expiry) then it may lead to whole queue getting "stuck".

To Reproduce

  • set MaxSize to 10
  • add 30 requests to the queue
  • the first 10 requests will be "let in" and given a token/session
  • immediately afterwards, abandon next 10 sessions in the waiting room - before token is given to any of then
  • expire the 10 "active" sessions
  • serving_num should increase by 10 - but users index 10-20 are no longer polling (!)
  • waiting room is stuck

Note that it only affects MaxSizeInlet strategy

Expected behavior
Some way (?) :) to detect when users waiting in the queue abandoned the request before being granted a token/session

Please complete the following information about the solution:

  • v 1.0

Hi, thanks for documenting this issue. We'll work on a solution for our next release and discuss here to get your thoughts on approach.

OK thank you

my initial idea on possible, suggested solution is for Redis to send pubsub notifications via SNS on key expiry - which would trigger Lambda (if thats possible of course - will check in coming days)

Just so you know - we've tested my idea (above) and its working quite nicely - so its certainly one possible way of solving the issue. We have a python listener script that triggers serving_num change in response to redis pubsub notification when request id key-value entry expire in redis (user abandons waiting room without getting a session token)

If I am understanding the fix, do you set an expiry time that starts when the request ID becomes eligible for a token, and send the notification when that time passes?

Yes, the original idea was to set expiry time when saving request ID into redis - and then periodically extend it when user is polling the service (while waiting in the queue). If the user would abandon the queue (before they're let in / obtain a token), redis entry wouldn’t get extended because no polling would occur - and would eventually expire. We’d then use redis pubsub notifications and listen for expiration event for any such key and could use that information somehow.

However, upon further investigation we figured that just knowing eg. How many people abandoned the queue isn’t really helpful as we can’t just eg. Increase the serving_num - it would have other undesirable side-effects…

For this reason we’ve now decided we might want to build a fully custom solution based on redis Sorted Sets instead of just counters (like serving_num)

This is a much tougher problem that it might appear at first glance - and there are other related ones that it doesn't look like you have encountered yet - see for example Why Accuracy Matters for Virtual Waiting Rooms and Online Queues .

It's far harder than it looks - you're going to need a sophisticated AI if you want to solve it exactly, like we did.

TBH rather than the extensive rebuild your system would require to solve this, it would be much more cost effective for Amazon to use or buy Queue-Fair. Having invented and patented the original rate-based Virtual Waiting Room for busy websites way back in 2004, we already solved this problem perfectly - as well as the others you are yet to face - and our system is far cheaper and more efficient to run for your cloud users to boot. Just saying!

Hope this is helpful!

Thanks for the feedback. We're taking a different approach.
We publish sample customer operational costs in the implementation guide here:
https://docs.aws.amazon.com/solutions/latest/virtual-waiting-room-on-aws/cost.html
Can you provide a link to your cost model? I probably wasn't looking in the right place.
Best of luck!

Hi Jim, we don't show pricing on our website after AB testing revealed that it didn't help with conversions, even though we're by far the cheapest provider on the market, but we'd love to have a commercial conversation with you! Please email sales AT queue-fair DOT com at your convenience and we'll be happy to hear from you.

Thanks for the link to your costs table. I'm not going to go into detail on this public forum, but I can tell you our cost model is much cheaper. Much much cheaper. Like less than 1% of your costs.

Have a lovely weekend!