v15 regression: OptionalArrayInference and InlineSDFGs pass fails
FlorianDeconinck opened this issue · comments
NASA's and NOAA climate model code running DaCe is failing simplify
on OptionalArrayInference
and InlineSDFGs
. Deactivating those two pass still fails on codegen, with a similar error.
It seems linked to a deep-copy within sdfg.utils.postdominators
.
BT
dsl/pace/dsl/dace/orchestration.py:505: in __call__
return wrapped(*arg, **kwarg)
dsl/pace/dsl/dace/orchestration.py:412: in __call__
return _call_sdfg(
dsl/pace/dsl/dace/orchestration.py:261: in _call_sdfg
res = _build_sdfg(daceprog, sdfg, config, args, kwargs)
dsl/pace/dsl/dace/orchestration.py:155: in _build_sdfg
_simplify(sdfg, validate=False, verbose=True)
dsl/pace/dsl/dace/orchestration.py:115: in _simplify
return SimplifyPass(
external/dace/dace/transformation/passes/simplify.py:106: in apply_pass
result = super().apply_pass(sdfg, pipeline_results)
external/dace/dace/transformation/pass_pipeline.py:547: in apply_pass
newret = super().apply_pass(sdfg, state)
external/dace/dace/transformation/pass_pipeline.py:502: in apply_pass
r = self.apply_subpass(sdfg, p, state)
external/dace/dace/transformation/passes/simplify.py:83: in apply_subpass
ret = p.apply_pass(sdfg, state)
external/dace/dace/transformation/passes/optional_arrays.py:65: in apply_pass
for state in self.traverse_unconditional_states(sdfg):
external/dace/dace/transformation/passes/optional_arrays.py:102: in traverse_unconditional_states
ipostdom = sdutil.postdominators(sdfg)
external/dace/dace/sdfg/utils.py:1541: in postdominators
ipostdom: Dict[SDFGState, SDFGState] = nx.immediate_dominators(sdfg._nx.reverse(), sink)
.venv/lib/python3.8/site-packages/networkx/classes/digraph.py:1219: in reverse
H.add_edges_from((v, u, deepcopy(d)) for u, v, d in self.edges(data=True))
.venv/lib/python3.8/site-packages/networkx/classes/digraph.py:676: in add_edges_from
for e in ebunch_to_add:
.venv/lib/python3.8/site-packages/networkx/classes/digraph.py:1219: in <genexpr>
H.add_edges_from((v, u, deepcopy(d)) for u, v, d in self.edges(data=True))
/home/fgdeconi/.pyenv/versions/3.8.10/lib/python3.8/copy.py:146: in deepcopy
[...]
E TypeError: cannot pickle 'PyCapsule' object
/home/fgdeconi/.pyenv/versions/3.8.10/lib/python3.8/copy.py:161: TypeError
Self contained reproducer - pulling on DaCe v0.15 in pace/external/dace
. This will pull the model and execute a small numerical regression test that will fail with the above stack trace. Code referenced is in comments of the script.
# Repo is to run the FiniteVolumeTransport regression test
# Original code: fv3core/pace/fv3core/stencils/fvtp2d.py
# DaCe is applied on the FiniteVolumeTransport.__call__ function
# The failing DaCe is in "pace/external/dace"
HOME=$PWD
# Get Pace repository
git clone git@github.com:GEOS-ESM/pace
cd pace
git checkout 911368
git submodule update --recursive --init
# Setup the venv
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install external/gt4py/
pip install external/dace/
pip install -r requirements_dev.txt -c constraints.txt
cd external/dace/
git checkout v0.15
cd $HOME/pace
# Download data
mkdir -p test_data
cd test_data
wget https://portal.nccs.nasa.gov/datashare/astg/smt/pace-regression-data/8.1.3_c12_6_ranks_standard.FvTp2d.tar.gz
tar -xzvf 8.1.3_c12_6_ranks_standard.FvTp2d.tar.gz
cd $HOME/pace
# Run test of FvTp2d
export FV3_DACEMODE=BuildAndRun
export PACE_CONSTANTS=GFS
pytest -v -s --data_path=./test_data/8.1.3/c12_6ranks_standard/dycore \
--backend=dace:cpu --which_modules=FvTp2d --which_rank=0 \
--threshold_overrides_file=./fv3core/tests/savepoint/translate/overrides/standard.yaml \
./fv3core/tests/savepoint
@FlorianDeconinck @alexnick83 After some digging, it is caused by MPIResolver's adding new fields into the AST of code that ends up in the SDFG code blocks: c224013
Adding the parent field to AST nodes is dangerous and we should replace it with a dictionary that doesn't outlive preprocessing, which I now did in #1446
Tested 7ea43c3
and confirm it clears the original deep copy issue.