FR: we need to be sure writing points to db was successfull

Question

FR: we need to be sure writing points to db was successfull

mburger81 opened this issue 6 years ago · comments

Hello we are using your lib which is working well and try to contribute some code, like we have done today with this PR #4.

Another thing we need, and I think could be also interesting for all others is to know if writing some one or more point was successful. This naturally is not so easy because the lib is doing the work async and with multiple points, which is GREAT!
Having a look to the code we think we or you can do it without a big effort.

First of all I think writeBuffers from SocketConnection should return true or false if the call was successful or not.
In the run method where you write the Point in the buffer you could keep which Points you are trying to write, if the writeBuffers return true for example you could call a callback for the PointFactory passing all points which was called successful.

What is happening if the writing was not successful?
Probably you could only offer again the points to the pointQueue? In this case the question is, is it possible some points was stared to DB and some points not? If the format of one point is wrong what is happening?

It seems there could be nothing is written if we have a problem on the protocol
{"error":"unable to parse points"}

Or we have a partial written values if for example we have wrong type on value, but you don't know which point it is
{"error":"partial write: field type conflict: input field \"value2\" on measurement \"cpu\" is type integer, already exists as type float dropped=1"}
but in this case you probably can assume, a point is wrong and you could NEVER write it to the database. In this case perhaps you could always remove the point?!?

Another solution could be, delegate always the responsibility to the client, give him the points which has tried to write and the return status of the post? In this case the client can decide if he would like to write again the points int he InfluxDB class or if he would remove them.

What do you think?
p.s. sorry for my bad English 😄

mburger81 · Answer 1 · Mon Nov 05 2018 17:51:43 GMT+0800 (China Standard Time)

@brettwooldridge Hello can you please jump into this FR we would like to help improve the lib

Brett Wooldridge · Answer 2 · Tue Dec 04 2018 15:06:39 GMT+0800 (China Standard Time)

@mburger81 Thanks for the ideas, I will need to think about them. My concern with tracking points is the performance overhead involved. The application is responsible for not writing invalid points -- such as a field type mismatch. I do not think the overhead of error reporting in this case is justified.

If there is a protocol error, such as influx4j generating an invalid wire-format, that should be considered a bug in influx4j, and again providing a retry/recovery feature is not justified.

The one valid case that I can think of is the handling of communications errors between influx4j and InfluxDB. In this case, it may be reasonable for influx4j to simply retry the inserts at least up to the point where the internal point queue is full -- after that, I feel that an error should be logged, but the points simply discarded. In this scenario, it may make sense for influx4j to provide a callback with all of the discarded points...

Kristian Waagan · Answer 3 · Wed Dec 05 2018 03:52:09 GMT+0800 (China Standard Time)

Looking at my current use of this library I assume the points will be written, but I don't really care if some points are lost.

This is partly because of the nature of the data, but also because I value other characteristics more:
Use of the influx4j library should not interfere with the purpose of the host application.

To me that means:
a) reasonable resource usage
b) not filling my disk with excessive error messages if InfluxDB is down / unreachable (but I would like some kind of failure indicator if the library knows something is wrong)
c) not halting / blocking my application if events can't be delivered

The above points may be relevant in the design / implementation of synchronous or guaranteed delivery.

mburger81 · Answer 4 · Wed Dec 05 2018 04:01:06 GMT+0800 (China Standard Time)

First use case we need an error callback is if InfluxDB is down or internet is not working. I think we have to know if there are some fatal error and we have to save data locally and retry later. Is this not a valid and very important scenario? Thx Am Di., 4. Dez. 2018, 20:52 hat Kristian Waagan <notifications@github.com> geschrieben:

…

Looking at my current use of this library I assume the points will be written, but I don't really care if some points are lost. This is partly because of the nature of the data, but also because I value other characteristics more: Use of the influx4j library should not interfere with the purpose of the host application. To me that means: a) reasonable resource usage b) not filling my disk with excessive error messages if InfluxDB is down / unreachable (but I would like some kind of failure indicator if the library knows something is wrong) c) not halting / blocking my application if events can't be delivered The above points may be relevant in the design / implementation of synchronous or guaranteed delivery. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADmpTO0NfH8nTnD_zzaNSAm1GnPwS-SLks5u1tJpgaJpZM4X72LJ> .

Brett Wooldridge · Answer 5 · Tue Dec 11 2018 02:53:35 GMT+0800 (China Standard Time)

@mburger81 Yes, I think an error callback is valid. I am considering what form it should take, and what information should be provided. That is why this issue is still open. I do not have a definite schedule regarding when it might be implemented -- but my goal is within the next several weeks.

mburger81 · Answer 6 · Wed Dec 19 2018 00:59:44 GMT+0800 (China Standard Time)

Hello I was busy and I only hat toady time to check your changes.

You added more detailed log and also a check for valid points, but for what I can see this is only a part of the final solution right?

At the end I think we need a success call back and an error callback with an identifier of a transaction. Or a set of unique identifier for the points on write tentative.
So you can decide what to do with a set of points, remove them or add them again?

Brett Wooldridge · Answer 7 · Wed Dec 19 2018 04:12:16 GMT+0800 (China Standard Time)

@mburger81 I have committed new changes. But it is 5am here in Tokyo, and I haven't had time to test them (at all). I'll review them again "tommorow" (after a few hours of sleep).

Brett Wooldridge · Answer 8 · Fri Dec 21 2018 15:41:33 GMT+0800 (China Standard Time)

@mburger81 I put together a rough example of a "resilient" wrapper around influx4j here.

Basically, how the influx4j retry mechanism works is this:

The driver internal point buffer holds 64K Point objects.
If the write of points out to InfluxDB fails, in what appears to be a recoverable way, then the driver will continue to retry the batch every AutoFlushPeriod milliseconds (1000 by default).
New points that are written will continue to accumulate in the internal point buffer while the retries are on-going.
However, if the internal point buffer reaches 75% full (48K points), the retry batch (buffer) will be dumped (dropped).
Whenever the batch completes, either successfully or unsuccessfully, a user-specified InfluxDbListener will be called back (specifically, the outcome() method.

It is an exercise left up to the user how to deal with dumped points, and how to further retry them. I do not see this as the responsibility of the driver itself. The example code cited above provides one simple scheme to do so, but there are many others.