dapr / dotnet-sdk

Dapr SDK for .NET

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Workflows]Cleanup strategies

odzhurik opened this issue · comments

  1. Do we need to purge workflows when they are completed or terminated?

  2. I terminate a workflow, wait for its termination and then purge it, however, its activity gets rescheduled

time="2024-02-28T14:30:01.0010509Z" level=debug msg="Executing reminder for actor dapr.internal.default.proposal-management-workflows.activity||RfpProcessingWorkflow_/test-site-2-20-1/w/04b5583e-2f2c-4706-9cc1-7892ccd85471_key-asks_b827433e-f36b-1410-83e6-0026361783f1::1::1||run-activity" app_id=proposal-management-workflows instance=5d72465eef73 scope=dapr.runtime.actor type=log ver=1.12.0

time="2024-02-28T14:30:01.0013402Z" level=debug msg="Invoking reminder 'run-activity' on activity actor 'RfpProcessingWorkflow_/test-site-2-20-1/w/04b5583e-2f2c-4706-9cc1-7892ccd85471_key-asks_b827433e-f36b-1410-83e6-0026361783f1::1::1'" app_id=proposal-management-workflows instance=5d72465eef73 scope=dapr.runtime.wfengine type=log ver=1.12.0
time="2024-02-28T14:30:01.0014389Z" level=debug msg="activity-processor: processing work item: RfpProcessingWorkflow_/test-site-2-20-1/w/04b5583e-2f2c-4706-9cc1-7892ccd85471_key-asks_b827433e-f36b-1410-83e6-0026361783f1/ProcessRfpReportActivity#1" app_id=proposal-management-workflows instance=5d72465eef73 scope=wfengine.backend type=log ver=1.12.0
time="2024-02-28T14:30:01.0114042Z" level=debug msg="invoking method 'AddWorkflowEvent' on workflow actor 'RfpProcessingWorkflow_/test-site-2-20-1/w/04b5583e-2f2c-4706-9cc1-7892ccd85471_key-asks_b827433e-f36b-1410-83e6-0026361783f1'" app_id=proposal-management-workflows instance=5d72465eef73 scope=dapr.runtime.wfengine type=log ver=1.12.0
time="2024-02-28T14:30:01.0115085Z" level=debug msg="RfpProcessingWorkflow_/test-site-2-20-1/w/04b5583e-2f2c-4706-9cc1-7892ccd85471_key-asks_b827433e-f36b-1410-83e6-0026361783f1: loading workflow state" app_id=proposal-management-workflows instance=5d72465eef73 scope=dapr.runtime.wfengine type=log ver=1.12.0
time="2024-02-28T14:30:01.0123418Z" level=warning msg="RfpProcessingWorkflow_/test-site-2-20-1/w/04b5583e-2f2c-4706-9cc1-7892ccd85471_key-asks_b827433e-f36b-1410-83e6-0026361783f1::1::1: execution failed with a recoverable error and will be retried later: failed to invoke 'AddWorkflowEvent' method on workflow actor: no such instance exists" app_id=proposal-management-workflows instance=5d72465eef73 scope=dapr.runtime.wfengine type=log ver=1.12.0

Is there any approach for cleaning up workflows that are terminated? Will this activity stop getting rescheduled?
Can I pause a workflow, then terminate and then purge it?

@cgillum could there be a race condition here where Activities are wrongly attempted after the Workflow instance has already been terminated & purged ?

There may be a race condition here where the following happens:

  1. Workflow schedules an activity
  2. Workflow gets terminated
  3. Activity executes (w/out knowing about the termination)
  4. Workflow gets purged
  5. Activity completion is sent to workflow actor
  6. Workflow actor fails to load workflow state because it's been purged, and then retries

I think it's okay for the race condition to exist but in this case, the big problem is the fact that we treat the state load failure as retriable, which ideally we shouldn't be. I think there needs to be a bug fix to make that particular error non-retriable, which would allow the system to get back into a good state.