uber-go / cadence-client

Framework for authoring workflows and activities running on top of the Cadence orchestration engine.

Home Page:https://cadenceworkflow.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Non deterministic error for childWorkflow ID after reset

longquanzheng opened this issue · comments

In go client when starting a child workflow, if workflow ID is not specified, by default the childWF ID is determined by the parent workflow's runID.

See code:

params.workflowID = wc.workflowInfo.WorkflowExecution.RunID + "_" + wc.GenerateSequenceID()

And reset will terminate the current run and start a new run.
So after the runID change after reset, the default childWorkflowID is also changed. Therefore it's non deterministic.

Ideally childWorkflow shouldn’t be dependent on parent runID. Java client uses a different way so it doesn’t have the issue. Go client probably should follow the same approach.

What you can do before Cadence team fixes this issue

Prevention is always better

To avoid running into this, please always provide a stable workflowID when starting childWorkflows.

  • Please use SideEffect to generate the UUID if you don't have one from business logic. Example:
var childID string 
workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
		return uuid.New().String()
	}).Get(&childID)
cwo := workflow.ChildWorkflowOptions{
	WorkflowID:                   childID,
	ExecutionStartToCloseTimeout: time.Minute,
}
ctx = workflow.WithChildOptions(ctx, cwo)
err := workflow.ExecuteChildWorkflow(ctx, XyzChildWorkflow,...)

Mitigation

If you already started childWorkflows without specifying workflowIDs
The only way to use reset with childWorkflow is that you prepare a mapping between parent workflow IDs to childWorkflowIDs

  • First of all, update the workflow code to use stable workflowID for childWorkflow for new executing only, using Versioning. Example code:
var childID string 
v := workflow.GetVersion(ctx, "startChildWfXyzForReset", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
    // when workflow executed this code with DefaultVersion, it was using empty childWF ID, let's keep using empty childWorkflowID like before
   childID = ""
}else{
   // when workflow executed this code with Version 1, let's use a stable workflowID for child so that we won't run into this issue. 
   workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
		return uuid.New().String()
	}).Get(&childID)
}

cwo := workflow.ChildWorkflowOptions{
	WorkflowID:                   childID,
	ExecutionStartToCloseTimeout: time.Minute,
}
ctx = workflow.WithChildOptions(ctx, cwo)
err := workflow.ExecuteChildWorkflow(ctx, XyzChildWorkflow,...)
  • Deploy the code. Then any new childWFs will be started with a stable WorkflowID. This must be done before next step so that the mapping is not changing when you are working on it.
  • Now, start working on the mapping for old childWFs that don't have the stable workflowID. You may have to look at history event of either parent workflow -- StartChildWorkflowExecutionInitiatedEvent , or childWorklfow--WorkflowExecutionStartedEvent (could be easier, since it's always in the first event).
  • After you have the map, update the workflow code to use the map to start childWorkflows using Versioning. -- In replay mode(childWorkflow has been started, use the map), in executing/nonReplay mode, specify a workflowID. Example code
parentExecution := workflow.GetInfo(ctx).WorkflowExecution
var childID string 
v := workflow.GetVersion(ctx, "startChildWfXyzForReset", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
   // when workflow executed this code with DefaultVersion, it was using empty childWF ID, let's keep using empty  This is backward compatible as the childWFID will match with the one in the history.
  childID = childWfIDMap[parentExecution.WorkflowID]
}else{
  // when workflow executed this code with Version 1, let's use a stable workflowID for child so that we won't run into this 
  workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
   	return uuid.New().String()
   }).Get(&childID)
}
cwo := workflow.ChildWorkflowOptions{
   WorkflowID:                   childID,
   ExecutionStartToCloseTimeout: time.Minute,
}
ctx = workflow.WithChildOptions(ctx, cwo)
err := workflow.ExecuteChildWorkflow(ctx, XyzChildWorkflow,...)
  • Deploy the code and then resetting parent won’t be any problems

Also updated sample to let people not run into this issue: uber-common/cadence-samples#49

The fix is merged