zalando / nakadi

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues

Home Page:https://nakadi.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add starting offset attribute to event type partition info

PetrGlad opened this issue · comments

This is an API improvement. Our Nakadi client is archiving incoming streams of events in an external persistent storage. Our goal is to store all events without data losses.

For our use cases we would like to get starting offset of event partition that would be consistent with other parts with Nakadi API. Namely, in other parts of API starting offset is "offset of first event - 1". But information about even type partition from partition_get provides oldest_available_offset which is offset that points exactly to fist available event. So to get starting stream offset that is consistent with other parts of API we would need to use cursor arithmetics to subtract "1" from it.

We know that there is placeholder offset begin that can can be used to specify oldest offset of the stream whatever it is. But in our case we would like to

  • know exact offset of stream start as we use this value to store and compare check points
  • know exact number of events between some outdated cursor and start of the stream

In latter case we can work around by subtracting 1 from distance after cursors comparison with cursor-distances.
But for check pointing we use starting offset, for instance, as an offset marker for stored data. Normally we use last event offset from last received event batch for this. But in case of restarts we might find that older events are already discarded and then we would like to still have actual value of "begin" offset for the same purpose.

We can use Partition's oldest_available_offset as stream start but then this would make use to lose oldest event if we reset to this position, and this would introduce inconsistencies in our completeness checks where we detect data losses. This is especially visible problem for infrequent events that are generated in intervals comparable or longer than retention time. So for such events available event range normally consists of 1 or 2 events or is empty.

In all our cases inclusive beginning offset makes it an additional special case that have to be handled separately in a Nakadi client's code. E.g. starting offset comparison, empty or 1 event stream case, and so on. Also it requires use to use cursor arithmetics with shifted-cursors to work around this inconsistency.

To make changes backwards compatible I suggest adding a new attribute to Partition description that would point to "oldest_available_offset - 1". The attribute name could be, say, starting_offset.

To clarify:

  • Invariant: starting_offset == oldest_available_offset - 1
  • Empty stream => starting_offset == newest_available_offset
  • Invariant: starting_offset == actual begin value at the moment

If there are no objections I probably can implement this myself.

Thank you @PetrGlad for your suggestion.

We reviewed it internally, but decided that it would be better not to implement this request. Here are the reasons that motivate this decision:

  • this field would not be compatible with some other backends (e.g., Kinesis)
  • we worry that it will confuse users even more, where starting_offset is 1 position earlier than oldest_available_offset
  • this can already be achieved with one extra call to shifted-cursors, as you correctly point out in your issue description. We do not see a use case for a frequent use of this feature, so we believe that the extra call will not cause unreasonable delays in processing events from Nakadi. Furthermore, having Nakadi provide this extra field would actually require the same call to shifted-cursors, but internally.
  • an alternative is to use BEGIN, which has the added benefit of never being out of date. By using a concrete offset, you always run the risk that it will expire between the moment you get the offset from Nakadi, and the moment you request events from this offset.

Nevertheless, your proposal highlights a rough edge in Nakadi, and we will aim to improve our API and/or documentation to make its usage easier.
If you have questions and/or alternative suggestions, we will of course be glad to help, and consider alternatives carefully. Feel free to re-open this issue if you would like to discuss it further.

I think Nakady should take responsibility of isolating clients from backend details and present consistent view of event stream in any case.

Yes, as I said in my description the new attribute does have risk of confusing users, this is one of possible backwards-compatible API changes. I think there could be alternative solutions.

In our case BEGIN being always up-to-date is disadvantage. We do want to fail fast if we miss data and see exactly how many events have we lost.

I hope you're OK with us doing things like

   if (oldestCursor.offset().endsWith("--1"))
      oldestCursor
    else
      subscriptionManager.shiftOffsetBackByOne(oldestCursor)

Now that we're working on new version of Diga, I think that we're going to use either lower level API or create subscriptions every time we start stream. That would allow us to avoid offset resets altogether.