Defect: internal error when accessing array-coarray of derived type with allocatable component
hsnyder opened this issue · comments
The following program produces incorrect results. The line marked ! NOTE
should print 1, 2, 3, 4, but instead I get "OpenCoarrays internal error on image 2: libcaf_mpi::caf_sendget_by_ref(): can not allocate 0 bytes of memory."
Versions:
GCC 11.2.0
OpenCoarrays 2.9.2
MPICH 3.2
program bug
type :: container
integer, allocatable :: stuff(:)
end type
type(container) :: co_containers(10)[*]
if (this_image() == 1) then
allocate(co_containers(2)%stuff(4))
co_containers(2)%stuff = [1,2,3,4]
end if
sync all
if (this_image() == 2) then
print *, co_containers(2)[1]%stuff ! NOTE
end if
end program
@vehre please let us know if you have the timed and interest to work on this issue.
This seems similar to code that I referenced via a link in a comment on issue #700. I'm posting that link again below because the initial comment on issue 700 included only a reduced version of what I was trying to do. I hope to find time to create a separate issue for the larger example, but for now, here it is:
@vehre this has entered our critical path for a paper draft due in December. Any chance you can work on a fix soon?
@rouson I can find some time to work on this, but I have to report: The example in the description works for me on:
- gfortran 11.3.1, fedora 35 system supplied
- mpich 3.4.1, fedora 35 system supplied
- opencoarrays 2.10.1, self-compiled
Only the one in the link in Link to the somewhat larger demonstrator crashes on init, with an PMPI_Win_allocate: Invalid topology,...
error. If you want me to investigate further, just let me know.
@vehre Yes, please investigate further. @everythingfunctional encountered this same issue yesterday. I'll let him confirm his setup. I have a broken installation at the moment after problems with a macOS upgrade so I'm not able to confirm this immediately.
I'm running with:
- gfortran 12.2.0, Arch Linux system supplied
- openmpi 4.1.4, Arch Linux system supplied
- opencoarrays 2.10.1, self-compiled
The following reproducer is what led me back here:
module payload_m
implicit none
private
public :: payload_t, empty_payload
type :: payload_t
!! A raw buffer to facilitate data transfer between images
!!
!! Facilitates view of the data as either a string or raw bytes.
!! Typical usage will be either to
!! * produce a string representation of the data, and then parse that string to recover the original data
!! * use the `transfer` function to copy the raw bytes of the data
private
integer, allocatable, public :: payload_(:)
contains
private
procedure, public :: raw_payload
procedure, public :: string_payload
end type
interface payload_t
pure module function from_raw(payload) result(new_payload)
implicit none
integer, intent(in) :: payload(:)
type(payload_t) :: new_payload
end function
pure module function from_string(payload) result(new_payload)
implicit none
character(len=*), intent(in) :: payload
type(payload_t) :: new_payload
end function
module procedure empty_payload
end interface
interface
pure module function empty_payload()
implicit none
type(payload_t) :: empty_payload
end function
pure module function raw_payload(self)
implicit none
class(payload_t), intent(in) :: self
integer, allocatable :: raw_payload(:)
end function
pure module function string_payload(self)
implicit none
class(payload_t), intent(in) :: self
character(len=:), allocatable :: string_payload
end function
end interface
end module
submodule(payload_m) payload_s
implicit none
contains
module procedure from_raw
new_payload%payload_ = payload
end procedure
module procedure from_string
new_payload = payload_t([len(payload), transfer(payload,[integer::])])
end procedure
module procedure empty_payload
empty_payload%payload_ = [integer::]
end procedure
module procedure raw_payload
if (allocated(self%payload_)) then
raw_payload = self%payload_
else
raw_payload = [integer::]
end if
end procedure
module procedure string_payload
if (allocated(self%payload_)) then
if (size(self%payload_) > 0) then
allocate(character(len=self%payload_(1)) :: string_payload)
if (len(string_payload) > 0) &
string_payload = transfer(self%payload_(2:),string_payload)
else
allocate(character(len=0) :: string_payload)
end if
else
allocate(character(len=0) :: string_payload)
end if
end procedure
end submodule
program example
use payload_m, only: payload_t
character(len=*), parameter :: MESSAGE = "Hello, World!"
type(payload_t) :: mailbox[*]
if (this_image() == 1) then
mailbox = payload_t(MESSAGE)
end if
sync all
if (this_image() /= 1) then
mailbox = mailbox[1]
end if
print *, mailbox%string_payload(), " from image: ", this_image()
end program
which crashes as follows:
$ caf -g -fbacktrace main.f90 -o main
$ cafrun -n 2 ./main
Hello, World! from image: 1
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f98c1e519ff in ???
#1 0x557dbdb21c12 in __payload_m_MOD_string_payload
at /home/brad/examples/coarray-allocatable-components/main.f90:84
#2 0x557dbdb233b7 in example
at /home/brad/examples/coarray-allocatable-components/main.f90:111
#3 0x557dbdb234ad in main
at /home/brad/examples/coarray-allocatable-components/main.f90:98
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node stray exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Error: Command:
`/usr/bin/mpiexec -n 2 ./main`
failed to run.
@rouson I have analysed the issue further: The code generated for managing the allocatable component of the type in the module does not take into account that the type could be used in a coarray. I.e. there is no space assigned (at compile time) to keep track of the allocation status and the coarray (slave) token. Furthermore is because of this no code generated to define "a mpi window" for the allocated memory, which prevents the one image from the accessing this memory on the other image.
Or with other words: Solving this issue is nothing quick to be done, but some bigger effort. One needs to find a way to portably generate the code of the module to either always call coarray-registration routines for every memory allocated in a module when the compile flag -fcoarray=lib is given or to create two instances of the code to execute for each module, one with coarray support and one without.
How to proceed?
@vehre thanks for the quick reply. If this had been a quick fix, Sourcery Institute could have funded it. Because its a larger effort, I'll need to seek alternative funding. Please email me a rough estimate at your earliest convenience.
The reproducer submitted in the original comment for this issue has been fixed on the main branch and will appear in the 2.10.1 release, which currently has draft release notes. @everythingfunctional please create a new issue with the from comment your issue comment above and link to @vehre's comment.