open-mpi / mpi-test-suite

Repository from Github https://github.comopen-mpi/mpi-test-suiteRepository from Github https://github.comopen-mpi/mpi-test-suite

heap corruption caused by tst_type_setvalue for MPI_TYPE_MIX_LB_UB

BenWibking opened this issue · comments

I get an out-of-bounds write detected when running (built with Clang's AddressSanitizer against OpenMPI 4.1.4):

=================================================================
==62169==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000106d3894f at pc 0x00010455f60c bp 0x00016ba0f850 sp 0x00016ba0f848
WRITE of size 1 at 0x000106d3894f thread T0
    #0 0x10455f608 in tst_type_setvalue tst_types.c:984
    #1 0x1045600d8 in tst_type_setstandardarray tst_types.c:1012
    #2 0x104510810 in tst_p2p_simple_ring_init tst_p2p_simple_ring.c:39
    #3 0x10454dd0c in tst_test_init_func tst_tests.c:1453
    #4 0x1044b91d8 in main mpi_test_suite.c:455
    #5 0x1a38fbe4c  (<unknown module>)

0x000106d3894f is located 1 bytes to the left of 1-byte region [0x000106d38950,0x000106d38951)
allocated by thread T0 here:
    #0 0x104ca2ca8 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3eca8)
    #1 0x104557f6c in tst_type_allocvalues tst_types.c:563
    #2 0x1045103a8 in tst_p2p_simple_ring_init tst_p2p_simple_ring.c:30
    #3 0x10454dd0c in tst_test_init_func tst_tests.c:1453
    #4 0x1044b91d8 in main mpi_test_suite.c:455
    #5 0x1a38fbe4c  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow tst_types.c:984 in tst_type_setvalue

Full log: mpi_test_suite_heap_corruption.txt

I can also reproduce this on Fedora 37 with gcc 12.2.1:

P2P tests Ring (5/101), comm MPI_COMM_WORLD (1/9), type MPI_TYPE_MIX_LB_UB (29/29)
=================================================================
==124097==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff8000856f at pc 0x00000052d7a8 bp 0xffffc99ba540 sp 0xffffc99ba558
WRITE of size 1 at 0xffff8000856f thread T0
    #0 0x52d7a4 in tst_type_setvalue /home/benwibking.linux/mpi-test-suite/tst_types.c:984
    #1 0x52ddd8 in tst_type_setstandardarray /home/benwibking.linux/mpi-test-suite/tst_types.c:1012
    #2 0x4ec5d8 in tst_p2p_simple_ring_init p2p/tst_p2p_simple_ring.c:39
    #3 0x51d35c in tst_test_init_func /home/benwibking.linux/mpi-test-suite/tst_tests.c:1453
    #4 0x4a6e5c in main /home/benwibking.linux/mpi-test-suite/mpi_test_suite.c:455
    #5 0xffff8541b584 in __libc_start_call_main (/lib64/libc.so.6+0x2b584)
    #6 0xffff8541b65c in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2b65c)
    #7 0x4055ac in _start (/home/benwibking.linux/mpi-test-suite/mpi_test_suite+0x4055ac)

0xffff8000856f is located 1 bytes to the left of 1-byte region [0xffff80008570,0xffff80008571)
allocated by thread T0 here:
    #0 0xffff85d8e500 in malloc (/lib64/libasan.so.8+0xae500)
    #1 0x526a60 in tst_type_allocvalues /home/benwibking.linux/mpi-test-suite/tst_types.c:563
    #2 0x4ec218 in tst_p2p_simple_ring_init p2p/tst_p2p_simple_ring.c:30
    #3 0x51d35c in tst_test_init_func /home/benwibking.linux/mpi-test-suite/tst_tests.c:1453
    #4 0x4a6e5c in main /home/benwibking.linux/mpi-test-suite/mpi_test_suite.c:455
    #5 0xffff8541b584 in __libc_start_call_main (/lib64/libc.so.6+0x2b584)
    #6 0xffff8541b65c in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2b65c)
    #7 0x4055ac in _start (/home/benwibking.linux/mpi-test-suite/mpi_test_suite+0x4055ac)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/benwibking.linux/mpi-test-suite/tst_types.c:984 in tst_type_setvalue
Shadow bytes around the buggy address:
  0x200ff0001050: fa fa 04 fa fa fa fd fa fa fa fd fa fa fa 04 fa
  0x200ff0001060: fa fa fd fa fa fa fd fa fa fa 04 fa fa fa fd fa
  0x200ff0001070: fa fa fd fa fa fa fd fa fa fa fa fa fa fa fa fa
  0x200ff0001080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff0001090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x200ff00010a0: fa fa fa fa fa fa fa fa fa fa 01 fa fa[fa]01 fa
  0x200ff00010b0: fa fa 01 fa fa fa 01 fa fa fa 01 fa fa fa 01 fa
  0x200ff00010c0: fa fa 01 fa fa fa 01 fa fa fa 01 fa fa fa 01 fa
  0x200ff00010d0: fa fa 01 fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x200ff00010e0: fa fa fd fa fa fa 04 fa fa fa fd fa fa fa fd fa
  0x200ff00010f0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==124097==ABORTING

There is a bug in tst_type_setvalue for MPI_TYPE_MIX_LB_UB.

I ran into this problem, too, on AWS Graviton3 with ACFL 23.10 and OMPI branch v5.0.x.

My observation from gdb and code is that there is problem with types whose ub == lb == 0 in the types[] table (

static struct type types[32] = {
), which will cause malloc(zero), and then some tst_type_setvalue() functions, called by loops in the testssuite, come and write data to these zero-malloc-ed pointers as if they were valid addresses.

This issue may be the same as issue #7 as well.

@BenWibking I applied your patch #11 but the segfault still occurs (maybe somewhere after).

The MPI_TYPE_MIX_LB_UB does not have the lb or ub set to zero. According to OMPI datatype description we are looking at:

Datatype 0x137fcb0[] id -1 size 27 align 8 opal_id 0 length 7 used 6
true_lb -17 true_ub 10 (true_extent 27) lb -17 ub 10 (extent 27)
nbElems 6 loops 0 flags 11C4 (committed )-c--lu-GDH-[---][INT]
contain lb ub OPAL_LB:* OPAL_UB:* OPAL_INT1:* OPAL_INT2:* OPAL_INT4:* OPAL_FLOAT4:* OPAL_FLOAT8:* OPAL_LONG:*
--C---P-D--[---][---] OPAL_INT1 count 1 disp 0xffffffffffffffff (-1) blen 1 extent 1 (size 1)
--C---P-DH-[---][---] OPAL_INT2 count 1 disp 0x0 (0) blen 1 extent 2 (size 2)
--C---P-DH-[---][---] OPAL_INT4 count 1 disp 0x2 (2) blen 1 extent 4 (size 4)
--C---P-DH-[---][---] OPAL_LONG count 1 disp 0xfffffffffffffff7 (-9) blen 1 extent 8 (size 8)
--C---P-D--[---][---] OPAL_FLOAT4 count 1 disp 0x6 (6) blen 1 extent 4 (size 4)
--C---P-D--[---][---] OPAL_FLOAT8 count 1 disp 0xffffffffffffffef (-17) blen 1 extent 8 (size 8)
-------G---[---][---] OPAL_LOOP_E prev 6 elements first elem displacement -1 size of data 27

type 11 count ints 9 count disp 8 count datatype 8
ints: 8 1 1 1 1 1 1 1 1
MPI_Aint: -17 -1 0 2 -9 6 -17 10
types: MPI_LB MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_FLOAT MPI_DOUBLE MPI_UB