coin-or / Cbc

COIN-OR Branch-and-Cut solver

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

2.10.9 update crashes in CbcBaseModel::waitForThreadsInTree

jschueller opened this issue · comments

since the cbc update 2.10.9, cbc crashes via bonmin on its simple c++ example (https://github.com/coin-or/Bonmin/tree/master/examples/CppExample)

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

NLP0012I 
              Num      Status      Obj             It       time                 Location
NLP0014I             1         OPT -2.618034       13 0.005785
NLP0014I             2         OPT -2.618034        9 0.003997
NLP0014I             3         OPT -2       10 0.004034
NLP0014I             4         OPT -1.7071068        8 0.003429
NLP0014I             5         OPT -2.5001414       20 0.006979
Cbc0010I After 0 nodes, 1 on tree, 1e+50 best solution, best possible -1.7976931e+308 (0.02 seconds)
NLP0014I             6         OPT -2.5001414       20 0.003155
NLP0014I             7         OPT -1.7071068        8 0.003414
NLP0014I             8         OPT -2.5001414       19 0.006966

Program received signal SIGSEGV, Segmentation fault.
CbcBaseModel::waitForThreadsInTree (this=0x5602947cca40, type=type@entry=2) at CbcThread.cpp:831
831         if (!baseModel->tree()->empty()) {
#0  CbcBaseModel::waitForThreadsInTree (this=0x5602947cca40, type=type@entry=2) at CbcThread.cpp:831
#1  0x00007fba92d30b15 in CbcModel::branchAndBound (this=0x7ffe141d0708, doStatistics=0) at CbcModel.cpp:5194
#2  0x00007fba9378f7ba in Bonmin::Bab::branchAndBound(Bonmin::BabSetupBase&) () from /usr/lib/libbonmin.so.4
#3  0x0000560294328a8b in main (argc=<optimized out>, argv=<optimized out>) at MyBonmin.cpp:80

Careful: dependent packages vol+bcp+bonmin have to be recompiled when cbc gets updated to see the exact same error, else it might crash elsewhere.
I used all latest cgl 0.60.7, clp 1.17.8, osi 0.108.8, coinutils 2.11.8, bonmin 1.8.9, vol 1.5.4, bcp 1.4.4, ipopt 3.14.12
It is fine if I revert to 2.10.8 (and rebuild vol/bcp/bonmin)

this is with gcc 12.2.1 / archlinux

CXXFLAGS=

-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions \
        -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security \
        -fstack-clash-protection -fcf-protection -g

LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now"

Could you say more about exactly how you built the projects? What were your configure options? For example, it looks like you built Cbc with --enable-cbc-parallel? I built Bonmin 1.8 with the tip of each dependent stable branch. For the Cbc stack and Bonmin itself, these are identical to the current latest releases. With that setup, everything worked fine on my machine. Can you try that also and see if it works? If so, that will give us a data point for narrowing down the issue.

coinbrew fetch Bonmin@1.8
coinbrew build Bonmin

I tried running coinbrew but it failed to build mumps:

$ coinbrew build Bonmin@stable/1.8.9
...
##################################################
### Building ThirdParty/Mumps 1.6 
##################################################

.../ThirdParty/Mumps/MUMPS/src/dmumps_comm_buffer.F:2667:23:

 2658 |         CALL MPI_PACK( WHAT, 1, MPI_INTEGER,
      |                       2
......
 2667 |         CALL MPI_PACK( LIST_SLAVES, NSLAVES, MPI_INTEGER,
      |                       1
Error: Rank mismatch between actual argument at (1) and actual argument at (2) (scalar and rank-1)
.../ThirdParty/Mumps/MUMPS/src/dmumps_comm_buffer.F:2670:23:

I will try to rebuild without --enable-cbc-parallel and bisect cbc next week.

For the Mumps issue, I guess adding ADD_FFLAGS=-fallow-argument-mismatch should fix the problem, per coin-or/coinbrew#47.

I now built the full stack with --enable-cbc-parallel and it still works fine for me.

It also happens with default compilation flags, so this is not caused by archlinux compilation flags

it seems to come from 1e6e301 what does this do @jjhforrest ?

1e6e3016003941406c5a63dd022c53d958111c35 is the first bad commit
commit 1e6e3016003941406c5a63dd022c53d958111c35
Author: John Forrest <jjhforrest@gmail.com>
Date:   Fri May 27 17:37:23 2022 +0100

    to allow root symmetry orbital

 Cbc/src/CbcModel.cpp    |  388 ++++++++++++-
 Cbc/src/CbcModel.hpp    |   16 +
 Cbc/src/CbcNode.cpp     |  107 ++++
 Cbc/src/CbcSymmetry.cpp | 1377 +++++++++++++++++++++++++++++++++++++++--------
 Cbc/src/CbcSymmetry.hpp |   82 ++-
 5 files changed, 1720 insertions(+), 250 deletions(-)
bisect found first bad commit

the error disappears if I disable nauty

@tkralphs what distro / compiler do you use ? do you enable nauty ?

I am unable to reproduce this error - with or without nauty. The symmetry breaking code is only entered if user has set option - and that is not set in example.

what distro / compiler are you using ?

Bonmin 1.8.9 and gcc 11.3

Is that the version from ubuntu ?
I'm using gcc 12.2

Ubuntu

I think I figured it out: COIN_HAS_NTY private macro is used in public header CbcModel.hpp
so 3rd party code dont see a class with the same size, I guess this causes the crash with gcc 12, an alignement issue or something
if I just drop the COIN_HAS_NTY from the definition of the attributes it fixes my crash locally

Yes, that could be it. The use of COIN_HAS_NTY in public headers was indeed introduced with 1e6e301.
(On master, this is named CBC_HAS_NTY and the define is exported in CbcConfig.h, but not on this ancient stable branch.)

I proposed to drop it but maybe its best to define it in the config header as you mentionned

Problem with the name COIN_HAS_NTY is that it is too generic. If someone still tries to build Couenne, then it could give a conflict there (probably just a compiler warning, though).

lets drop it then as in #593 ?

I leave this to Ted or John. I don't really see why 1e6e301 has been added to stable/2.10 in the first place. It doesn't look like a bugfix to me and the changelog entry in the readme isn't enlightening either. If they want to keep it, then #593 is also fine with me.

I missed that there were more than just bug fixes being introduced in this release. #593 looks to be the right fix. I will merge and make a new release unless @jjhforrest objects.

go ahead - I will try and remember not to put anything interesting into stable - just fixes.

yes, boring stuff only please :]