[Help] Errors when ingesting many objects (~2000) in bursts
sladkoff opened this issue · comments
I have a relatively simple use case and not a lot of data:
- There's a scheduled task that runs every x hours to index some objects in sonic
- It should remove any outdated data and re-insert the latest state of the objects
- The procedure looks something like this:
// flush the whole collection to get rid of potentially deleted objects await ingest.flushc(collection) // ... // for all our entities for (entity of entities) { // remove the object (this is a duplicate of the flushc call, it could be skipped) await ingest.flusho(collection, bucket, entity.id) // re-insert the object with latest data await ingest.push(collection, bucket, entity.id, entity.text) }
This looks fine in theory to me but I'm experiencing a problem that a lot of data that is apparently pushed
by my application code is missing from the Sonic index. My investigations so far have led to some errors thrown by this node-sonic-channel (see questions below) so I'm opening this issue here.
Question(s) 1
I'm a little lost on how to use the Ingest
connection over time.
In the above scenario, I'm iterating over n entries and doing stuff that might take some time. Should I open one connection for the whole procedure? As I understand the connection can be closed by timeout (?) or other reasons? Is there a recipe on how to reconnect and continue such a batch procedure in case the connection is closed? Or is it safer to open and close a connection per ingest.push
call?
Question 2
What does this error mean and what should I do to prevent it?
Error: Offline stack is full, cannot stack more operations until Sonic Channel connection is restored (maximum size set to: 500 entries)
Probably comes from here.
Question 3
What does this error mean and what should I do to prevent it?
channel closed
Probably comes from here
All of this seems like a very simple use-case so I'm assuming that I'm doing something very wrong. I'd appreciate some help. Thanks!
Sonic has a backpressure safety mechanism, which is basically a kill-switch if there are WAY too many operations pending on the server side. It will abort the flooding client connection. This is on Sonic side, this cannot be changed as this would imply increasing the network-related buffers.
Now, node-sonic-channel
also has a backpressure management algorithm (look for this.__emitQueue
), which will internally queue pending tasks... until there are too many (in order to protect the running NodeJS process, as the node-sonic-channel library might be running on a process shared w/ eg an HTTP server). You can change this by increasing the emitQueueMaxSize
option when constructing the Sonic Channel client.
Ok, thanks for the insight. We will continue using one connection per push for now because otherwise we need to implement some sort of throttling on our side in order not to reach the libs limits. This makes our process a bit slower but the implementation is simpler.