temporalio / temporal

Temporal service

Home Page:https://docs.temporal.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Address force completion when make a request through CompleteByID with a failure.

alexseedkou opened this issue · comments

Is your feature request related to a problem? Please describe.
This is a follow-up feature of this issue.

For an async call, the activity may fail due to a bug or design drawback after a call to an external server. However, temporal server may receive a request from CompleteByID API by other servers while the new attempt for a retryable error of the activity is not started yet. In this case, if the request is to complete the activity, we would complete the activity even the new attempt has not started yet so that we can unblock the workflow (refer this PR). However, if the request is to fail the activity, we may think an appropriate way to handle such cases as we are not sure the failure is to fail the activity or it is a transit error and we want to attempt the activity again.

Describe the solution you'd like
For the request to fail an activity:

  1. If the request is to force fail an activity, we should fail the activity if the attempt for a retryable error has not started yet.
  2. If the request is to fail an activity due to a non-retryable error, we should fail the activity.

Describe alternatives you've considered
We may introduce a separate API or a flag for a client to tell the server that it would like a request to force fail the activity.

Additional context
Add any other context or screenshots about the feature request here.

Let me take a stab at the explanation.

This PR now allows completing an activity by ID when an activity is backing off between attempts.

There are two other APIs for resolving an activity that we considered changing the behavior for while an activity is backing off: RespondActivityTaskFailedById and RespondActivityTaskCanceledById.

For RespondActivityTaskCanceledById there's no need to do anything because the server immediately resolves the activity as canceled.
For RespondActivityTaskFailedById, there are a couple of different cases:

  • The failure is non-retryable (e.g. a non retryable ApplicationFailure, a failure that is matched in the retry policy's non retryable error types, and a failure reported in the last permitted attempt). In this case, we may want to allow unblocking the workflow and resolving the activity as failed.
  • The failure is retryable. This call should IMHO be a noop to avoid wasting an activity attempt without having a chance to get started and triggering the next retry backoff.

As @alexseedkou we may want to consider an explicit flag to "force" the activity to fail, bypassing the retry policy, but I think using a non retryable application failure can already be used as that marker.

Just a note that once this done, we'll need to update API and SDKs documentation to reflect the fact that RespondActivityTask* and RespondActivityTask*ById have different behaviors in regard to in-backoff activities.

The failure is retryable. This call should IMHO be a noop to avoid wasting an activity attempt without having a chance to get started and triggering the next retry backoff.

There is a feature request to retry activity immediately when waiting for the next retry. So we either introduce another method to force immediate retry or assume such a Respond call is such a request.