Wrong signature for spllt_solve() in drivers/spllt_omp_bench
learning-chip opened this issue · comments
Hi @cayrols @flipflapflop , thanks for releasing this code. I saw the paper Parallelization of the solve phase in a task-based Cholesky solver using a sequential task flow model and want to reproduce its results. However there seems to be errors with driver routines. Below are my attempts and reproducible steps.
Problem description
Compiling drivers/spllt_omp_bench.F90
gives the following error:
[ 96%] Building Fortran object CMakeFiles/spllt_omp_bench.dir/drivers/spllt_omp_bench.F90.o
/opt/SpLLT/drivers/spllt_omp_bench.F90:296:39:
296 | call spllt_compute_solve_dep(fkeep)
| 1
Error: Missing actual argument for argument 'stat' at (1)
/opt/SpLLT/drivers/spllt_omp_bench.F90:333:55:
333 | workspace=workspace, task_manager=task_manager)
| 1
Error: There is no specific subroutine for the generic 'spllt_solve' at (1)
/opt/SpLLT/drivers/spllt_omp_bench.F90:355:55:
355 | workspace=workspace, task_manager=task_manager)
| 1
Error: There is no specific subroutine for the generic 'spllt_solve' at (1)
make[3]: *** [CMakeFiles/spllt_omp_bench.dir/build.make:63: CMakeFiles/spllt_omp_bench.dir/drivers/spllt_omp_bench.F90.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:102: CMakeFiles/spllt_omp_bench.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:109: CMakeFiles/spllt_omp_bench.dir/rule] Error 2
make: *** [Makefile:118: spllt_omp_bench] Error 2
Attempted fix
The above error is caused by the wrong calling signatures for spllt_compute_solve_dep()
and spllt_solve()
. The first error is easily fixed by changing call spllt_compute_solve_dep(fkeep)
to call spllt_compute_solve_dep(fkeep, stat = st)
.
The second error is caused by such invalid subroutine calls:
call spllt_solve(fkeep, options, order, nrhs, sol_computed, info, job=1, &
workspace=workspace, task_manager=task_manager)
...
call spllt_solve(fkeep, options, order, nrhs, sol_computed, info, job=2, &
workspace=workspace, task_manager=task_manager)
spllt_solve()
is defined in src/spllt_solve_mod.F90
as:
interface spllt_solve
module procedure spllt_solve_one_double
module procedure spllt_solve_mult_double
module procedure spllt_solve_mult_double_worker
end interface
...
subroutine spllt_solve_one_double(fkeep, options, x, job, info)
...
subroutine spllt_solve_mult_double(fkeep, options, nrhs, x, job, info)
...
subroutine spllt_solve_mult_double_worker(fkeep, options, nrhs, x, &
job, task_manager, info)
Only spllt_solve_mult_double_worker()
takes task_manager
argument, while none of them takes workspace
argument. Also, they don't take order
argument (matrix permutation) as called in spllt_omp_bench.F90
.
I can correctly compile another script test/test_solve_phasis.F90
, so its signature should be correct:
SpLLT/test/test_solve_phasis.F90
Lines 308 to 311 in 08a181d
Thus I change the problematic calls in drivers/spllt_omp_bench.F90
to:
call spllt_solve_mult_double_worker(fkeep, options, nrhs, sol_computed, 1, task_manager, info)
...
call spllt_solve_mult_double_worker(fkeep, options, nrhs, sol_computed, 2, task_manager, info)
Error after fix
Now spllt_omp_bench
compiles successfully, but leads to memory error at run-time:
Matrix file = matrix.rb
Matrix format = csc
Number of CPUs = 1
Block size = 16
Supernode amalgamation nemin = 32
Reading...
ok
[analysis][prune_tree] nth: 1
[>] [spllt_stf_factorize] setup and activate nodes time: 7.000E-03 s
[>] [spllt_stf_factorize] task insert time: 7.000E-03 s
Allocation of a workspace of size 4.13E+05
#Subtree : 249
At line 961 of file /opt/SpLLT/src/spllt_solve_dep_mod.F90
Fortran runtime error: Index '1' of dimension 1 of array 'fkeep%sbc' above upper bound of 0
Error termination. Backtrace:
#0 0x7feaafed1d21 in ???
#1 0x7feaafed2869 in ???
#2 0x7feaafed2ee6 in ???
#3 0x55a53008b148 in __spllt_solve_dep_mod_MOD_fwd_update_dependency
at /opt/SpLLT/src/spllt_solve_dep_mod.F90:961
#4 0x55a53008bd27 in __spllt_solve_dep_mod_MOD_spllt_compute_blk_solve_dep
at /opt/SpLLT/src/spllt_solve_dep_mod.F90:248
#5 0x55a53008dc70 in __spllt_solve_dep_mod_MOD_spllt_compute_solve_dep
at /opt/SpLLT/src/spllt_solve_dep_mod.F90:271
#6 0x55a530071020 in MAIN__._omp_fn.1
at /opt/SpLLT/drivers/spllt_omp_bench.F90:409
#7 0x7feaafd3878d in ???
#8 0x7feaafa6c608 in ???
#9 0x7feaafc30132 in ???
#10 0xffffffffffffffff in ???
Reproducible Dockerfile
To ease reproducibility, here's a Dockerfile to generate the compile error I got:
FROM ubuntu:20.04
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
git wget vim \
gcc g++ gfortran \
libblas-dev liblapack-dev \
libnuma-dev \
libhwloc-dev \
libmetis-dev \
libudev-dev \
make cmake \
autoconf pkgconf \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt
# Build Spral
RUN git clone https://github.com/ralna/spral.git \
&& cd spral \
&& ./autogen.sh \
&& CC=gcc CXX=g++ FC=gfortran ./configure \
--prefix=/opt/spral_install \
--disable-openmp --disable-gpu \
--with-metis="-L/usr/lib/x86_64-linux-gnu -lmetis" \
&& make \
&& make install \
&& cp *.mod /opt/spral_install/include/
# Build SpLLT
RUN git clone https://github.com/NLAFET/SpLLT \
&& cd SpLLT \
&& mkdir -p build/build_omp \
&& cd build/build_omp \
&& mkdir log \
&& CC=gcc CXX=g++ FC=gfortran cmake \
-DRUNTIME=OMP \
-DSPRAL_LIB=/opt/spral_install/lib \
-DSPRAL_INC=/opt/spral_install/include \
-DMETIS_LIB=/usr/lib/x86_64-linux-gnu \
-DMETIS_INC=/usr/include \
../.. 2>&1 | tee log/cmake_spllt_omp.log \
&& make spllt 2>&1 | tee log/make_spllt.log
# `make` or `make all` leads to error at `spllt_omp_bench`
WORKDIR /opt/SpLLT/build/build_omp
RUN make test_solve_phasis 2>&1 | tee log/make_test.log
# prepare test matrix
RUN mkdir /opt/data \
&& cd /opt/data \
&& wget https://suitesparse-collection-website.herokuapp.com/RB/Schmid/thermal1.tar.gz \
&& tar zxvf thermal1.tar.gz
# run test script, success
RUN ln -s /opt/data/thermal1/thermal1.rb matrix.rb \
&& ./test_solve_phasis | 2>&1 tee log/run_test.log
# Build SpLLT driver, get compile error
RUN make spllt_omp_bench 2>&1 | tee log/make_driver.log
Run
docker build -t spllt_debug .
docker run --rm -it spllt_debug
Then various logs will be inside build_omp/log
of the container.