LLNL / UEDGE

Describe the bug
When a case is converged (fnrm < ftol) and icntnunk is set to 1, the following exmain call causes a segfault

To Reproduce

Navigate to a case that is converged
Ensure the case is well converged by executing exmain until fnrm<ftol
Set bbb.ictnunk=1
Call exmain
Segfault occurs

For the Slab_geometry test case in the pytests, the following commands will reproduce the bug:
from rd_slabH_in_w_h5 import *;bbb.exmain();bbb.exmain();bbb.icntnunk=1;bbb.exmain()

Expected behavior
Execute exmain, print initial fnrm, set iterm=1, return to prompt

Additional context
Has been encountered by several users at various points of using the code, most commonly associated with time-dependent runs where ictnunk is actively switched on and off. Bug does not appear to occur for basis versions.

Bug traced to odesetup.m conditional:

UEDGE/bbb/odesetup.m

Line 6555 in cc15c0d

if (icntnunk==1 .and. ijactot<=1 .and. svrpkg=='nksol') then

Segfault occurs due to xerrab call at:

UEDGE/bbb/odesetup.m

Line 6556 in cc15c0d

call xerrab('**Error: need initial Jacobian-pair for icntnunk=1')

Using this information, the expected behavior can be achieved by executing the following commands in the Slab_geometry test directory:
bbb.exmain();bbb.exmain();bbb.icntnunk=1;bbb.ijactot=2;bbb.exmain()

It appears the issue is with ijactot. Presumably, this flag exists to ensure there exists an Jacobian before trying to continue the code execution under the assumption that there is one using icntunk=1. However, it is not clear to me why ijactot=1 is insufficient for the routine to proceed.

On a general note, xerrab calls in the Python version seems to induce Segfaults rather than the expected behavior of printing the error message and returning to prompt. This should probably be fixed.

The following line(s) appears to explicitly prohibit ijactot from exeeding 1:

UEDGE/bbb/odesetup.m

Line 6559 in cc15c0d

c -- Reinitialize ijactot if icntnunk = 0; prevents ijactot=2 by 2 exmain

However, after the first exmain call, ijactot=2. It appears ijactot is only advaced in calc_jac:

UEDGE/bbb/oderhs.m

Line 8427 in cc15c0d

ijactot = ijactot + 1 #note: ijactot set 0 in exmain if icntnunk=0

During the first exmain-call there are 2 Jacobian updates, explaining why ijactot=2. This means exmain can be executed with icntnunk=1 only after calls when more than one Jacobian update was necessary to obtain convergence.

The suggested solution is to change the conditional on icntnunk=1 to ijactot<1. I don't see why more than one Jacobian evaluation is necessary for exmain calls with icntnunk=1.

One issue is if the Jacobian calculation is aborted mid-execution: in this case, isjactot=1 but the preconditioner is only partially evaluated, leading to errors if doing an exmain call with ictnunk=1 immediately after aborting, if the conditional is relaxed to ijactot<1. This is due to ijactot += 1 occurring at the beginning of subroutine jac_calc, rather than upon successful completion:

UEDGE/bbb/oderhs.m

Line 8427 in cc15c0d

ijactot = ijactot + 1 #note: ijactot set 0 in exmain if icntnunk=0

Moving the above line to aroundabouts here would solve this issue:

UEDGE/bbb/oderhs.m

Line 8575 in cc15c0d

jcsc(neq+1) = nnz

However, failing the conditional and causing a xerrab will still segfault, which is a different issue that needs to be resolved

Suggested fix implemented and tested. Closing as resolved.

This has the same underlying cause as #49. See that item for details.

Segfault for ictnunk=1 when case is converged