Segmentation fault caused by buffer overflow in ChannelBuffer

Question

Segmentation fault caused by buffer overflow in ChannelBuffer

dsarmany opened this issue 8 months ago · comments

What happened?

On LUMI, with huge amount of logging thanks to MULTIO_DEBUG=1, and the fact the filesystem there is sometimes slow, the default buffer size of 1024 may overflow and cause a crash.

What are the steps to reproduce the bug?

Run an HPC experiment on LUMI, using RAPS and IFS+FESOMv2 or IFS+NEMOv4. Set MULTIO_DEBUG=1. There is a good chance (but no certainly) that the error will occur.

We have not been able to reproduce it in other contexts, but if the buffer-size is set to a small value, the expectation is that it should be reproducible more readily.

Version

1.24.4

Platform (OS and architecture)

LUMI/23.03 & cpeCray/23.03

Relevant log output

No response

Accompanying data

No response

Organisation

ECMWF

Simon Smart · Answer 1 · Wed Nov 22 2023 23:43:48 GMT+0800 (China Standard Time)

I want to clarify this further. We are seeing the symptom on LUMI. But the bug is always present.

If, for whatever reason, the flushing of the eckit::Log internal buffers stalls, it is possible to trigger a segfault by doing logging output. This has nothing to do with MULTIO, MULTIO_DEBUG, or LUMI.

This can (presumably) be simply reproduced by removing the flushing code from the eckit::Log infrastructure.

The thing that is nasty about this is that it would produce segfaults that are extremely difficult to reproduce, as it is likely that the user did not know that there had been a (potentially intermittent) performance issue on the logging filesystem. We need to make sure that eckit::Log cannot overflow.

Philipp Geier · Answer 2 · Thu Nov 23 2023 22:13:17 GMT+0800 (China Standard Time)

Reproduction on LUMI (dev-g):

Reducing the ChannelBuffer size to 1 (or another low value) & use multio-hammer with a lot of threads:
MULTIO_SERVER_CONFIG_PATH=$CLIMATEDT_RUNS_HOME/ifs-bundle/ifs-bundle/source/multio/tests/multio/config FDB_DEBUG=0 MULTIO_DEBUG=0 gdb --args ./bin/multio-hammer --transport=thread --nbclients=16 --nbservers=8

Compiling with BIT or DEBUG (shows line numbers...) - both consistently point to eckit::WrapperTarget::write:

#0  0x00001555529810a6 in eckit::WrapperTarget::write (this=0x293690,
    start=0x3e1000 <error: Cannot access memory at address 0x3e1000>, end=0x36c11f "")
    at /pfs/lustrep3/scratch/project_465000454/phigeier/climatedt-runs/ifs-bundle/ifs-bundle/source/eckit/src/eckit/log/WrapperTarget.cc:39
#1  0x000015555297053e in eckit::ChannelBuffer::dumpBuffer (this=0x36a780)
    at /pfs/lustrep3/scratch/project_465000454/phigeier/climatedt-runs/ifs-bundle/ifs-bundle/source/eckit/src/eckit/log/ChannelBuffer.cc:72
#2  0x0000155552970ccb in eckit::ChannelBuffer::overflow (this=0x36a780, ch=61)
    at /pfs/lustrep3/scratch/project_465000454/phigeier/climatedt-runs/ifs-bundle/ifs-bundle/source/eckit/src/eckit/log/ChannelBuffer.cc:122
#3  0x000015554de21dab in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) ()
   from /usr/lib64/libstdc++.so.6

Eventually WrapperTarget::write gets clalled with a start > end (due to some raceconditions...)

A look in the code

void WrapperTarget::write(const char* start, const char* end) {

    const char* begin = start;

    while (start != end) {
        if (*start == '\n') {
            target_->write(begin, start);
            writeSuffix();
            target_->write(start, start + 1);
            prefix_ = true;
            begin   = start + 1;
        }
        else {
            if (prefix_) {
                writePrefix();
                prefix_ = false;
            }
        }
        start++;
    }

    if (begin != end) {
        if (prefix_) {
            writePrefix();
            prefix_ = false;
        }
        target_->write(begin, end);
    }
}

shows that the condition in the loop while(start != end) increments start++ and thus rolls over the memory until a segfault is caused....

More defensive conditionls like start < end and begin < end avoids this segfaults.

All in all the ChannelBuffer (and std::streambuf) is not threadsafe - hence even when avoiding these sleds, the output maybe just weird....

Reason for `start` being larger than `end`

WrapperTarget::write gets called by ChannelBuffer::dumpBuffer:

bool ChannelBuffer::dumpBuffer() {
    if (target_) {
        target_->write(pbase(), pptr());
    }
    setp(pbase(), epptr());
    return true;
}

Here we see that pbase() (which is the same buffer_.data(), and epptr() is the same as buffer_.data() + buffer_.size()) somehow gets assigned a wrong value, although it is assumed to be constant.

A possible reason can be found when digging in the implementation (libcxx-14.0.6):

    // 27.6.2.3.3 Put area:
    _LIBCPP_INLINE_VISIBILITY char_type* pbase() const {return __bout_;}
    _LIBCPP_INLINE_VISIBILITY char_type* pptr()  const {return __nout_;}
    _LIBCPP_INLINE_VISIBILITY char_type* epptr() const {return __eout_;}

    inline _LIBCPP_HIDE_FROM_ABI_AFTER_V1
    void pbump(int __n) { __nout_ += __n; }

    _LIBCPP_INLINE_VISIBILITY
    void __pbump(streamsize __n) { __nout_ += __n; }

    inline _LIBCPP_HIDE_FROM_ABI_AFTER_V1
    void setp(char_type* __pbeg, char_type* __pend) {
        __bout_ = __nout_ = __pbeg;
        __eout_ = __pend;
    }

Calling setp should reset __bout (which is pbase()) and __nout (which is pptr()) to the same value.
However - assigment is done via __bout = (__nout == __pbeg)). That means __bout is technically set to the value of __nout and not directly __pbeg.

In a threaded environment it is possible that the thread calling setp gets interupted just after the assignment __nout = __peg. Meanwhile another thread may write data to the buffer (e.g. via xputn) and increase __nout.
Although a __nout is never directly increased to a value larger than the end of the buffer, multiple threads can increase __nout in a racing manner such that it will be larger then the end of the buffer.

Philipp Geier · Answer 3 · Thu Nov 23 2023 22:16:54 GMT+0800 (China Standard Time)

Solutions:

Live with weird output but use defensive approaches - i.e. < instead != when pointers are compared.
Explicit synchronization:
- Put a mutex on ChannelBuffer - requires to overload a lot of functions (at least also xputn - but this may be just locking a mutex and calling the base implementation. overflow may be called recursively!
- Mutex on Channel (deriving std::ostream)

Segmentation fault caused by buffer overflow in ChannelBuffer

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

Reason for start being larger than end

Reason for `start` being larger than `end`