Flaky tests when transient error/delay happens
rmt2021 opened this issue · comments
Version Information
21710e9
Describe the bug
We find a few unit tests fail undeterministically due to different reasons when the Azure Storage service is unstable:
Akka.Streams.Azure.StorageQueue.Tests.QueueSinkSpec.A_QueueSink_should_add_elements_to_the_queue
Akka.Streams.Azure.StorageQueue.Tests.QueueSinkSpec.A_QueueSink_should_retry_failing_messages_if_supervision_strategy_is_resume
Akka.Streams.Azure.StorageQueue.Tests.QueueSinkSpec.A_QueueSink_should_skip_failing_messages_if_supervision_strategy_is_restart
Akka.Streams.Azure.StorageQueue.Tests.QueueSourceSpecs.A_QueueSource_should_fail_when_an_error_occurs
Akka.Streams.Azure.StorageQueue.Tests.QueueSourceSpecs.A_QueueSource_should_not_fail_if_the_supervision_strategy_is_not_stop_when_an_error_occurs
Akka.Streams.Azure.StorageQueue.Tests.QueueSourceSpecs.A_QueueSource_should_only_poll_if_demand_is_available
Akka.Streams.Azure.StorageQueue.Tests.QueueSourceSpecs.A_QueueSource_should_poll_for_messages_if_the_queue_is_empty
Akka.Streams.Azure.StorageQueue.Tests.QueueSourceSpecs.A_QueueSource_should_push_available_messages
For example, hardcoded timeout is often used in the test code like
t.Wait(TimeSpan.FromSeconds(3)).Should().BeTrue();
which expects t
finishes within 3 seconds.
If the Azure API called by t
gets processed slower than usual (when the service is busy), the test will fail.
Another example is in A_QueueSink_should_skip_failing_messages_if_supervision_strategy_is_restart
probe.SendNext("2");
probe.SendComplete();
await task;
(await Queue.ReceiveMessagesAsync()).Value[0].MessageText.Should().Be("2");
If some transient error happens (like 503 server is busy) when processing the Azure Storage request to send message, the test will fail due to System.IndexOutOfRangeException : Index was outside the bounds of the array.
We are not sure whether such flaky tests (caused by transient error/delay) are desired or not.
If not, would it be better to:
- wait until
t
finishes w/o hardcoded timeout in the first example? - check whether
Value
contains any element before callingValue[0]
in the second example?
If these are considered better practice, we are willing to send a PR for the fix.
check whether Value contains any element before calling Value[0] in the second example?
If these are considered better practice, we are willing to send a PR for the fix.
This is how I would do it. Let me know if you still are available to submit a PR