ESMCI / cmeps-cime

This is a "fork" of the cime repository that has the development version of the nuopc CMEPS driver and mediator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I/O failure in writing land restart files

rsdunlapiv opened this issue · comments

The following test fails while CLM is trying to write history files:

/glade/scratch/dunlap/ERS_Vnuopc_Ln5.f45_f45_mg37.I2000Clm50SpNuopc.cheyenne_intel.clm-nuopc_cap.GC.20190423_082340_rewub1/run

73:cesm.exe: ad_gpfs_wrcoll.c:834: ADIOI_Exch_and_write: Assertion `(off + size - req_off) == (int)(off + size - req_off)' failed.
73:MPT ERROR: Rank 73(g:73) received signal SIGABRT/SIGIOT(6).
73:	Process ID: 65813, Host: r9i6n2, Program: /gpfs/fs1/scratch/dunlap/ERS_Vnuopc_Ln5.f45_f45_mg37.I2000Clm50SpNuopc.cheyenne_intel.clm-nuopc_cap.GC.20190423_082340_rewub1/bld/cesm.exe
73:	MPT Version: HPE MPT 2.19  12/07/18 05:31:15
73:
73:MPT: --------stack traceback-------
73:MPT: Attaching to program: /proc/65813/exe, process 65813
73:MPT: done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=3d290be00d48b823d3b71df2249e80d881bc473d"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=0ea764119690f32c98faae9a63a73f35ed8b1099"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=79264652a62453da222372a430cd9351d4bbcbde"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=5409c48fdb15e90649c1407e444fbe31d6dc8ec1"
73:MPT: (no debugging symbols found)...done.
73:MPT: [Thread debugging using libthread_db enabled]
73:MPT: Using host libthread_db library "/glade/u/apps/ch/os/lib64/libthread_db.so.1".
73:MPT: Try: zypper install -C "debuginfo(build-id)=3a453a18f06ae88bd1b8146bf2ae8fcae5c4c203"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=f43d7754940a14ffe3d9bd8fc9472ffbbfead544"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=e97cfdb062d6f0c41073f2109a7605d0ae991c03"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=15916519d9dbaea26ec88427460b4cedb9c0a6ab"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=4c08f43bb18e99a7df4bad5c4a52bac67ddf9b8d"
73:MPT: (no debugging symbols found)...done.
73:MPT: Try: zypper install -C "debuginfo(build-id)=3ae04b58bd81ea7745dba789d89937e719309568"
73:MPT: (no debugging symbols found)...done.
73:MPT: 0x00002aaab83c841c in waitpid () from /glade/u/apps/ch/os/lib64/libpthread.so.0
73:MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.19-35.1.x86_64
73:MPT: (gdb) #0  0x00002aaab83c841c in waitpid ()
73:MPT:    from /glade/u/apps/ch/os/lib64/libpthread.so.0
73:MPT: #1  0x00002aaab8b08e66 in mpi_sgi_system (
73:MPT: #2  MPI_SGI_stacktraceback (
73:MPT:     header=header@entry=0x7ffffffd85c0 "MPT ERROR: Rank 73(g:73) received signal SIGABRT/SIGIOT(6).\n\tProcess ID: 65813, Host: r9i6n2, Program: /gpfs/fs1/scratch/dunlap/ERS_Vnuopc_Ln5.f45_f45_mg37.I2000Clm50SpNuopc.cheyenne_intel.clm-nuopc_c"...) at sig.c:340
73:MPT: #3  0x00002aaab8b09062 in first_arriver_handler (signo=signo@entry=6, 
73:MPT:     stack_trace_sem=stack_trace_sem@entry=0x2aaac5be0080) at sig.c:489
73:MPT: #4  0x00002aaab8b093fb in slave_sig_handler (signo=6, siginfo=<optimized out>, 
73:MPT:     extra=<optimized out>) at sig.c:564
73:MPT: #5  <signal handler called>
73:MPT: #6  0x00002aaab93a30c7 in raise () from /glade/u/apps/ch/os/lib64/libc.so.6
73:MPT: #7  0x00002aaab93a4478 in abort () from /glade/u/apps/ch/os/lib64/libc.so.6
73:MPT: #8  0x00002aaab939c146 in __assert_fail_base ()
73:MPT:    from /glade/u/apps/ch/os/lib64/libc.so.6
73:MPT: #9  0x00002aaab939c1f2 in __assert_fail ()
73:MPT:    from /glade/u/apps/ch/os/lib64/libc.so.6
73:MPT: #10 0x00002aaab8b435be in ADIOI_Exch_and_write (error_code=0x7ffffffd96b0, 
73:MPT:     buf_idx=0x8280000, fd_end=0x8270000, fd_start=0x8260000, fd_size=8231664, 
73:MPT:     min_st_offset=156944, contig_access_count=12582, len_list=0x19690000, 
73:MPT:     offset_list=0x19610000, others_req=0x196f0000, myrank=0, nprocs=2, 
73:MPT:     datatype=27, buf=0x8558270, fd=0x95bd140) at ad_gpfs_wrcoll.c:834
73:MPT: #11 ADIOI_GPFS_WriteStridedColl (fd=0x95bd140, buf=0x8558270, count=2007552, 
73:MPT:     datatype=27, file_ptr_type=<optimized out>, offset=<optimized out>, 
73:MPT:     status=0x7ffffffd9750, error_code=0x7ffffffd96b0) at ad_gpfs_wrcoll.c:468
73:MPT: #12 0x00002aaab8b76d63 in MPIOI_File_write_all (fh=<optimized out>, 
73:MPT:     offset=62976, file_ptr_type=file_ptr_type@entry=100, buf=<optimized out>, 
73:MPT:     count=2007552, datatype=27, 
73:MPT:     myname=myname@entry=0x2aaab8dba3c0 <myname.15116> "MPI_FILE_WRITE_AT_ALL", 
73:MPT:     status=0x7ffffffd9750) at write_all.c:125
73:MPT: #13 0x00002aaab8b77397 in PMPI_File_write_at_all (fh=<optimized out>, 
73:MPT:     offset=<optimized out>, buf=<optimized out>, count=<optimized out>, 
73:MPT:     datatype=<optimized out>, status=<optimized out>) at write_atall.c:64
73:MPT: #14 0x00000000010b9385 in ncmpio_read_write ()
73:MPT: #15 0x0000000001093384 in req_aggregation ()
73:MPT: #16 0x0000000001090dc6 in wait_getput ()
73:MPT: #17 0x000000000108eccc in req_commit ()
73:MPT: #18 0x0000000000ffbbd2 in ncmpi_wait_all ()
73:MPT: #19 0x0000000000f30901 in flush_output_buffer ()
73:MPT:     at /gpfs/u/home/dunlap/UFSCOMP.apr16/cime/src/externals/pio2/src/clib/pio_darray_int.c:1751
73:MPT: #20 0x0000000000f2a32a in PIOc_write_darray_multi ()