gogama / incite

Hassle-free queries on Amazon CloudWatch Logs Insights in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Don't allow start of time range to equal end

vcschapp opened this issue · comments

Description

With the dynamic chunk splitting feature active, and Parallel greater than 1, it is possible to get into a situation where mgr.getNextReadyChunk() creates a chunk with the "null" time range QuerySpec.End ... QuerySpec.End (at mgr.go lines 132-134).

This bug is described in detail by @artificial-aidan and @pierre-samsara in PR #24.

This results in the StartQuery request to the CloudWatch Logs Insights service failing with the error:

[2023-01-25 23:00:00 +0000 UTC..2023-01-25 23:00:00 +0000 UTC): InvalidParameterException: End time cannot be less than Start time (Service: AWSLogs; Status Code: 400; Error Code: InvalidParameterException;

And this error kills the entire query.

History

This bug was either introduced in v1.2.0 ("Dynamic chunk splitting and progress stats") or in v1.3.0 (Better query performance through higher concurrency).

Cause

Splitting chunks causes mgr.n to increase, which depending on the vicissitudes of parallelism can result in mgr.next < mgr.n and the stream being pushed back into the priority queue in cases where the are no more chunks available to lazily create. So the next chunk from stream.nextChunkRange() ends up having an empty time range because of this code.

Really the root cause is that mgr.n is trying to capture two distinct concepts at the same time:

  1. How many generation 0 chunks need to be lazily created from a stream living in the priority-ordered stream heap mgr.pq? (A static number.)
  2. How many total chunks are known to be needed to complete the stream at the current time? (A dynamic number which increases with chunk splitting.)

Possible Solution

One option is to create a new field mgr.last to do the job mgr.n was originally meant to do, namely provide the constant number of original chunks requested before splitting. Chunk splitting caused mgr.n to stealthily take on a second duty and that should be reversed.

I have a unit test that replicates the issue now.

I pushed a bug fix to the branch main, so you can consume it now by consuming from the HEAD of main instead of the latest published release.

The fix should be included in the next release, v1.4.0, which is probably 3-4 weeks away. @artificial-aidan, @pierre-samsara, let me know if consuming from HEAD works for you for now. If not, we can cut a temporary release that just includes the fix.

Instead of waiting for v1.4.0, I created a mini-release v1.3.4 which just adds this one bug fix to v1.3.3.

It should be available to consume now.

@artificial-aidan, @pierre-samsara, let me know if this new version seems to fix the problem for you.

@pierre-samsara confirmed the fix via a comment on pull request #24.

Closing this issue, as the fix is now available in v1.3.4.