heap corruption caused by tst_type_setvalue for MPI_TYPE_MIX_LB_UB
BenWibking opened this issue · comments
I get an out-of-bounds write detected when running (built with Clang's AddressSanitizer against OpenMPI 4.1.4):
=================================================================
==62169==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000106d3894f at pc 0x00010455f60c bp 0x00016ba0f850 sp 0x00016ba0f848
WRITE of size 1 at 0x000106d3894f thread T0
#0 0x10455f608 in tst_type_setvalue tst_types.c:984
#1 0x1045600d8 in tst_type_setstandardarray tst_types.c:1012
#2 0x104510810 in tst_p2p_simple_ring_init tst_p2p_simple_ring.c:39
#3 0x10454dd0c in tst_test_init_func tst_tests.c:1453
#4 0x1044b91d8 in main mpi_test_suite.c:455
#5 0x1a38fbe4c (<unknown module>)
0x000106d3894f is located 1 bytes to the left of 1-byte region [0x000106d38950,0x000106d38951)
allocated by thread T0 here:
#0 0x104ca2ca8 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3eca8)
#1 0x104557f6c in tst_type_allocvalues tst_types.c:563
#2 0x1045103a8 in tst_p2p_simple_ring_init tst_p2p_simple_ring.c:30
#3 0x10454dd0c in tst_test_init_func tst_tests.c:1453
#4 0x1044b91d8 in main mpi_test_suite.c:455
#5 0x1a38fbe4c (<unknown module>)
SUMMARY: AddressSanitizer: heap-buffer-overflow tst_types.c:984 in tst_type_setvalue
Full log: mpi_test_suite_heap_corruption.txt
I can also reproduce this on Fedora 37 with gcc 12.2.1:
P2P tests Ring (5/101), comm MPI_COMM_WORLD (1/9), type MPI_TYPE_MIX_LB_UB (29/29)
=================================================================
==124097==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff8000856f at pc 0x00000052d7a8 bp 0xffffc99ba540 sp 0xffffc99ba558
WRITE of size 1 at 0xffff8000856f thread T0
#0 0x52d7a4 in tst_type_setvalue /home/benwibking.linux/mpi-test-suite/tst_types.c:984
#1 0x52ddd8 in tst_type_setstandardarray /home/benwibking.linux/mpi-test-suite/tst_types.c:1012
#2 0x4ec5d8 in tst_p2p_simple_ring_init p2p/tst_p2p_simple_ring.c:39
#3 0x51d35c in tst_test_init_func /home/benwibking.linux/mpi-test-suite/tst_tests.c:1453
#4 0x4a6e5c in main /home/benwibking.linux/mpi-test-suite/mpi_test_suite.c:455
#5 0xffff8541b584 in __libc_start_call_main (/lib64/libc.so.6+0x2b584)
#6 0xffff8541b65c in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2b65c)
#7 0x4055ac in _start (/home/benwibking.linux/mpi-test-suite/mpi_test_suite+0x4055ac)
0xffff8000856f is located 1 bytes to the left of 1-byte region [0xffff80008570,0xffff80008571)
allocated by thread T0 here:
#0 0xffff85d8e500 in malloc (/lib64/libasan.so.8+0xae500)
#1 0x526a60 in tst_type_allocvalues /home/benwibking.linux/mpi-test-suite/tst_types.c:563
#2 0x4ec218 in tst_p2p_simple_ring_init p2p/tst_p2p_simple_ring.c:30
#3 0x51d35c in tst_test_init_func /home/benwibking.linux/mpi-test-suite/tst_tests.c:1453
#4 0x4a6e5c in main /home/benwibking.linux/mpi-test-suite/mpi_test_suite.c:455
#5 0xffff8541b584 in __libc_start_call_main (/lib64/libc.so.6+0x2b584)
#6 0xffff8541b65c in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2b65c)
#7 0x4055ac in _start (/home/benwibking.linux/mpi-test-suite/mpi_test_suite+0x4055ac)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/benwibking.linux/mpi-test-suite/tst_types.c:984 in tst_type_setvalue
Shadow bytes around the buggy address:
0x200ff0001050: fa fa 04 fa fa fa fd fa fa fa fd fa fa fa 04 fa
0x200ff0001060: fa fa fd fa fa fa fd fa fa fa 04 fa fa fa fd fa
0x200ff0001070: fa fa fd fa fa fa fd fa fa fa fa fa fa fa fa fa
0x200ff0001080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x200ff0001090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x200ff00010a0: fa fa fa fa fa fa fa fa fa fa 01 fa fa[fa]01 fa
0x200ff00010b0: fa fa 01 fa fa fa 01 fa fa fa 01 fa fa fa 01 fa
0x200ff00010c0: fa fa 01 fa fa fa 01 fa fa fa 01 fa fa fa 01 fa
0x200ff00010d0: fa fa 01 fa fa fa fd fa fa fa fd fa fa fa fd fa
0x200ff00010e0: fa fa fd fa fa fa 04 fa fa fa fd fa fa fa fd fa
0x200ff00010f0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==124097==ABORTING
There is a bug in tst_type_setvalue for MPI_TYPE_MIX_LB_UB.
I ran into this problem, too, on AWS Graviton3 with ACFL 23.10 and OMPI branch v5.0.x.
My observation from gdb and code is that there is problem with types whose ub == lb == 0 in the types[] table (
Line 93 in 109d071
malloc(zero), and then some tst_type_setvalue() functions, called by loops in the testssuite, come and write data to these zero-malloc-ed pointers as if they were valid addresses.
This issue may be the same as issue #7 as well.
@BenWibking I applied your patch #11 but the segfault still occurs (maybe somewhere after).
The MPI_TYPE_MIX_LB_UB does not have the lb or ub set to zero. According to OMPI datatype description we are looking at:
Datatype 0x137fcb0[] id -1 size 27 align 8 opal_id 0 length 7 used 6
true_lb -17 true_ub 10 (true_extent 27) lb -17 ub 10 (extent 27)
nbElems 6 loops 0 flags 11C4 (committed )-c--lu-GDH-[---][INT]
contain lb ub OPAL_LB:* OPAL_UB:* OPAL_INT1:* OPAL_INT2:* OPAL_INT4:* OPAL_FLOAT4:* OPAL_FLOAT8:* OPAL_LONG:*
--C---P-D--[---][---] OPAL_INT1 count 1 disp 0xffffffffffffffff (-1) blen 1 extent 1 (size 1)
--C---P-DH-[---][---] OPAL_INT2 count 1 disp 0x0 (0) blen 1 extent 2 (size 2)
--C---P-DH-[---][---] OPAL_INT4 count 1 disp 0x2 (2) blen 1 extent 4 (size 4)
--C---P-DH-[---][---] OPAL_LONG count 1 disp 0xfffffffffffffff7 (-9) blen 1 extent 8 (size 8)
--C---P-D--[---][---] OPAL_FLOAT4 count 1 disp 0x6 (6) blen 1 extent 4 (size 4)
--C---P-D--[---][---] OPAL_FLOAT8 count 1 disp 0xffffffffffffffef (-17) blen 1 extent 8 (size 8)
-------G---[---][---] OPAL_LOOP_E prev 6 elements first elem displacement -1 size of data 27type 11 count ints 9 count disp 8 count datatype 8
ints: 8 1 1 1 1 1 1 1 1
MPI_Aint: -17 -1 0 2 -9 6 -17 10
types: MPI_LB MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_FLOAT MPI_DOUBLE MPI_UB