Add retries
rucek opened this issue · comments
Retries
Background
We'd like to have a retry mechanism supporting:
- various retry schedules:
- direct retry - retrying up to a given number of times with no delay between subsequent retries,
- delayed retry - retrying up to a given number of times with a fixed delay between subsequent retries,
- retry with backoff - retrying up to a given number of times with an increasing delay between subsequent retries; this can optionally include a jitter, i.e. a random factor in the delay between subsequent attempts (see: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)
- various definitions of the logic to retry:
- direct, i.e.
f: => T
, Either
, i.e.f: => Either[E, T]
Try
, i.e.f: => Try[T]
- direct, i.e.
Proposed API
Heavily inspired by https://github.com/softwaremill/retry
import ox.retry
import scala.concurrent.duration.*
def foo: Int = ???
def bar: Either[String, Int] = ???
def baz: Try[Int]
// various logic definitions
retry.directly(3)(foo)
retry.directly(3).either(foo)
retry.directly(3).tried(foo) // can't use "try"
// various retry schedules
retry.directly(3)(foo)
retry.delay(3, 100 millis)(foo)
retry.backoff(3, 100 millis)(foo) // optional jitter defaults to Jitter.None
// Either with non-default jitter
retry.backoff(3, 100 millis, Jitter.Full).either(bar) // other jitters are: Equal, Decorrelated
Questions/doubts
- wouldn't it be better to include the retry schedules (directly/delay/backoff) as another paramater to
retry
, e.g.retry(delay(3, 100 millis), foo) // however, how would we handle Try/Either?
- is there any way to distinguish between
f: => T
andf: Try[T]
/f: Either[E, T]
without using dedicated functions liketried
/either
?
Thanks! That's a good start :)
So first I think we might consider extending the wish-list a bit. One thing I'm currently lacking is to decide if retries should be continued or modified basing on the result - this can include both a successful result (e.g. -1
can signal a "retry", same as a Left
, Result
etc.), as well as an exception. Maybe we could handle different outcomes using different retry strategies? E.g. a connection error to a DB would require a delay, but a transaction serializability failure would require an immediate retry
As for including the schedule as a parameter to retry - I like this idea :). And that's because maybe we can try to unify retry
and repeat
using a single API? It's not that different ... one retries the computation if the result is somehow "invalid", the second repeats the computation as long as the result is "valid" (e.g. repeat sending a ping
message every second).
Finally, the technical question - distinguishing => T
and => Try[T]
- this probably can be done using an implicit witness, which would be normally erased. Same trick you can do to create function overloads when the erasure of the types is the same.
There are a few concepts in cats-retry
that may be worth adopting:
- The notion of
operation success
(also present insoftwaremill/retry
) - a functionT => Boolean
- The notion of
retryable failure
, calledworthRetrying
in cats-retry - a function which decides whether a non-successful operation should be retried. - RetryPolicy as data, that can be defined separately, or even composed of smaller policies. In this case, a a policy means how we should retry when 1. operation failed and 2. is worth retrying: immediately, after a delay, etc.
- Side-effecting operations to run on retry - a custom function that the we can use for logging, metrics, etc.
For composing retry policies, cats-retry offers operations like:
join
: If one wants to give up, both give up for exampledelay(5.seconds).join(maxRetries(5))
, wheremaxRetries(5)
representsretry.directly(5)
from your examples. BTWjoin
is IMO not the best name.meet
: If both want to give up, it gives up. I don't think we want this, I don't see any important beneficial cases other than the idea of capping backoff, which can be handled by backoff settings.followedBy
: When one gives up, switch to the next one. Sounds pretty useful.
retry(op: => T)(policy)
- Use default success condition: for op: => T
it's "doesn't throw", for op: => Either[E, T]
it's Right(_)
, for op =>: Try[_]
it's Success[_]
. Use default isRetryable
, which means any unsuccesful outcome.
retry(op: => T, wasSuccessful: T => Boolean)(policy)
: custom success condition, default retry condition (any unsuccessful outcome)
retry(op: => T, wasSuccessful: T => Boolean, isRetryable: Try[T] => Boolean)(policy)
: custom success condition, custom retry condition - depending on exceptions or returned values.
retry(op: => Either[E, T], isRetryable: Try[E] => Boolean)(policy)
: default success condition (Right(_)
), custom retry condition - depending on exceptions or Left values
etc.
If overloading is too tricky, or turns out to be too difficult to handle for the user, we may consider predefined methods similar to cats-retry, like:
retryOnSomeErrors(op: => Either[E, T], isRetryable: E => Boolean)(policy)
// will not retry if op throws
retryOnSomeResultsAndSomeErrors(op: => Either[E, T], wasSuccessful: T => Boolean, isRetryableError: E => Boolean, isRetryableResult: T => Boolean)(policy)
retryOnFailuresAndSomeErrors(op: => Either[E, T], wasSuccessful: T => Boolean, isRetryable: E => Boolean)(policy)
retryOnThrow(op :=> T, isRetryable: Throwable => Boolean)(policy)
etc., the names are of course just examples.
Bonus, can be left for implementation in the future:
Adam asked about possibility to "retry immediately on transaction exceptions and delay on database exceptions".
We can achieve this by adding "evaluation condition" to policies and a special builder:
def op: T = ...
retryCond(op)(
delay(5.seconds).whenThrownA[DBConnectionException], // creates a ConditionalThrowableRetryPolicy
immediately.whenThrown {
case _: TransactionException => true
}
)
retryCond
is an additional operation which may require repeating all the variants, but it simplifies typechecking of the conditions (they have to match the type of unsuccessful op). This also allows to skip isRetryable
, it can be evaluated by checking first fulfilled policy condition.
Similarly for other types of op:
def op: Either[E, T] = ...
retryCond(op)(resultPolicies = List( // List[ConditionalRetryPolicy[T]]
delay(5.seconds).when[T](_.code == 300),
immediately.when[T](_.code != 300),
retryableErrorPolicies = List( // List[ConditionalRetryPolicy[E]]
immediately.whenMatches[E] {
case _: MyError1 | _: MyError2 => true
}
)
)
Conditional policies would act like composed with 'join', which is IMO enough.