[Help] Errors when ingesting many objects (~2000) in bursts

Question

[Help] Errors when ingesting many objects (~2000) in bursts

sladkoff opened this issue a year ago · comments

I have a relatively simple use case and not a lot of data:

There's a scheduled task that runs every x hours to index some objects in sonic
It should remove any outdated data and re-insert the latest state of the objects

The procedure looks something like this:

// flush the whole collection to get rid of potentially deleted objects
await ingest.flushc(collection)

// ...

// for all our entities
for (entity of entities) {
  // remove the object (this is a duplicate of the flushc call, it could be skipped)
  await ingest.flusho(collection, bucket, entity.id)
  // re-insert the object with latest data
  await ingest.push(collection, bucket, entity.id, entity.text)
}

This looks fine in theory to me but I'm experiencing a problem that a lot of data that is apparently pushed by my application code is missing from the Sonic index. My investigations so far have led to some errors thrown by this node-sonic-channel (see questions below) so I'm opening this issue here.

Question(s) 1

I'm a little lost on how to use the Ingest connection over time.

In the above scenario, I'm iterating over n entries and doing stuff that might take some time. Should I open one connection for the whole procedure? As I understand the connection can be closed by timeout (?) or other reasons? Is there a recipe on how to reconnect and continue such a batch procedure in case the connection is closed? Or is it safer to open and close a connection per ingest.push call?

Question 2

What does this error mean and what should I do to prevent it?

Error: Offline stack is full, cannot stack more operations until Sonic Channel connection is restored (maximum size set to: 500 entries)

Probably comes from here.

Question 3

What does this error mean and what should I do to prevent it?

channel closed

Probably comes from here

All of this seems like a very simple use-case so I'm assuming that I'm doing something very wrong. I'd appreciate some help. Thanks!

Valerian Saliou · Answer 1 · Sun May 21 2023 19:28:08 GMT+0800 (China Standard Time)

Sonic has a backpressure safety mechanism, which is basically a kill-switch if there are WAY too many operations pending on the server side. It will abort the flooding client connection. This is on Sonic side, this cannot be changed as this would imply increasing the network-related buffers.

Now, node-sonic-channel also has a backpressure management algorithm (look for this.__emitQueue), which will internally queue pending tasks... until there are too many (in order to protect the running NodeJS process, as the node-sonic-channel library might be running on a process shared w/ eg an HTTP server). You can change this by increasing the emitQueueMaxSize option when constructing the Sonic Channel client.

Leonid Koftun · Answer 2 · Tue May 23 2023 01:41:05 GMT+0800 (China Standard Time)

Ok, thanks for the insight. We will continue using one connection per push for now because otherwise we need to implement some sort of throttling on our side in order not to reach the libs limits. This makes our process a bit slower but the implementation is simpler.