Ordering guarantees

Question

Ordering guarantees

erikash opened this issue 8 years ago · comments

Hi,
Great stuff! 👍

I didn't see any info in the documentation regarding ordering guarantees of posting callbacks (in success and failure scenarios).

Could you please elaborate?

Thanks!
Erik.

Erik Ashepa commented 8 years ago

Bump!

Ask Solem · Answer 1 · Tue May 24 2016 02:31:50 GMT+0800 (China Standard Time)

Do you mean the on_success/on_error arguments to Event.send?

These callbacks are there for async programs (think asyncio, tornado, twisted, gevent, eventlet), and are not actually called when you use celery as the dispatcher (as you would have to serialize the functions). They won't be useful until you connect thorn to an event loop.

Success callbacks are called as soon as there is a response from the web request, so ordering depends on how fast the URLs respond, same with errbacks.

Erik Ashepa · Answer 2 · Tue May 24 2016 03:16:43 GMT+0800 (China Standard Time)

Hi,
I didn't refer to the async callbacks. I'll rephrase the question :
Suppose the user updates his username twice: a -> b and b -> c, which would trigger two http webhook callbacks for subscribed consumers (other microservices for example).
What would be the ordering guarantee between the aforementioned updates?
will the first update a-> b is guaranteed to be delivered to subscribers before b -> c? even if there was an error in a dispatching attempt?

If the behaviour is different between dispatchers (celery, tornado, asyncio), then please elaborate.

Does the question make more sense now?

Ask Solem · Answer 3 · Tue Jun 21 2016 03:17:42 GMT+0800 (China Standard Time)

Oops, I'm very sorry for the late reply, but my Github notification queue is miles deep :(

There's no ordering guarantees in the case you mention where a user updates his username twice.

Presumably this is a website and the user data is stored in a database.
If the user updates his username on web server A, then again quickly on web server B,
this could happen with very little delay if the user submits a form multiple times in rapid
succession.
As there are multiple connections to the database, there'd be no guarantee as to whose
value will win. There's kernel TCP/IP stacks, there's routers and switches, connection pools,
async I/O, pgpool/mysql proxy/pgbouncer, etc, etc.
Having the client send a timestamp to the database will not work, as you cannot rely on
timestamps/wall clock time to order events in a distributed system. NTP doesn't
make a difference, not even GPS/atomic clocks.

So even at the database level you cannot guarantee the order, but you will
regard the value winning in the database to be the now valid value. Sadly, we will face the same
problems when dispatching the webhooks

Both webserver A and webserver B will attempt to send messages to the broker at
the same time, and you cannot predetermine who will win the race.
If the two webhooks are dispatched from separate workers, then you also will not be able
to predetermine who will be able to deliver the request first.

It's impossible to solve this problem in a distributed system: you could have a distributed lock but I don't think they suit this purpose, or we could use Lamport timestamps/vector clocks, but then AFAIU the webhook consumers will be required to take an active part in the system.

RabbitMQ will have some ordering guarantees such that if you send two messages from the same connection they will be received in the same order, but that falls apart if a consumer rejects the message, dies in the middle of processing, or if there's a partition in a HA setup.

So messages coming from multiple clients are impossible to order, but
if you consider the data in the database to be the consistent state, then you could work around this by demanding that webhook consumers refetch the data when they receive it.

For example when you receive the message:

{"event": "user.changed",
 "data": {"username": "george", ...},
 "ref": "http://example.com/user/3124/"}

You make a request to http://example.com/user/3124 to get the canonical version of the data.
Even in this case the data may change between making the request and
receiving the HTTP response. This should illustrate how hopelessly difficult it is to keep data consistent
in a distributed environment, and instead of thinking about ordering guarantees you should consider
how you can make state updates be idempotent.

What will the endpoint use the data for? Will they associate data with the username in an internal database, then make sure usernames cannot be reused and that historical usernames point to the new name.

Erik Ashepa · Answer 4 · Wed Jun 29 2016 18:30:47 GMT+0800 (China Standard Time)

Thanks for the elaborate response! 👍

I completely agree with you regarding ordering of commands (e.g update username), it's practically impossible to synchronise two mouse clicks.

On the other hand there are a few ways for consumers to process the notifications in-order:

Subscribers are responsible for maintaining order:
a. Publisher persists a "notification" object in the same transaction as the "update username" command with a revision.
b. When a subscriber receives a web-hook he can compare it to its last processed notification revision and query the aforementioned notification storage if he missed a notification or received it out of order.
Publishers are responsible for maintaining order:
a. Publisher persists a "notification" object in the same transaction as the "update username" command with a revision.
b. The publisher will maintain an incrementing sequence of the last successfully processed notification (2xx status code) per subscriber and retry on failure before dispatching the next notification.

If the subscribers are not processing notifications in-order, how would they handle the following stock market scenario:

Order state is updated and the following web-hooks are published after an IOC order is executed by the exchange:

OrderFilled
OrderCancelled

The subscriber received the web-hooks in reverse order:

OrderCancelled
OrderFilled

How will a subscriber handle this scenario? (assuming that fetching the canonical version will not provide enough context)

I'm facing these challenges myself so i'm eager to hear your opinion!

Thanks,
Erik.