sonosaurus / sonobus

Source code for SonoBus, a real-time network audio streaming collaboration tool.

Home Page:https://sonobus.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SonoBus crash due to out of bounds access in aoo

Athrunen opened this issue · comments

For some time now my SonoBus has been crashing after an extended runtime of 6+ hours.
Not a problem for short runs but annoying in continuous operation.

When debugging, this line in aoo causes the actual crash:

std::copy(&buffer_[rdhead_], &buffer_[rdhead_ + n1], buffer);

When the read exceeds the length of the local buffer, rdhead_ + n1 is supposed to read to buffer_.size() - 1 but instead reads to buffer_.size() which is of course out of bounds.

inline int32_t SLIP::read_bytes(uint8_t *buffer, int32_t size){
auto capacity = (int32_t)buffer_.size();
if (size > balance_){
size = balance_;
}
auto end = rdhead_ + size;
int32_t n1, n2;
if (end > capacity){
n1 = capacity - rdhead_;
n2 = end - capacity;
} else {
n1 = size;
n2 = 0;
}
std::copy(&buffer_[rdhead_], &buffer_[rdhead_ + n1], buffer);
std::copy(&buffer_[0], &buffer_[n2], buffer + n1);
rdhead_ += size;
if (rdhead_ >= capacity){
rdhead_ -= capacity;
}
balance_ -= size;
return size;
}

The fix should be to just subtract 1 from n1, but I currently do not have the time to test that.

When the read exceeds the length of the local buffer, rdhead_ + n1 is supposed to read to buffer_.size() - 1 but instead reads to buffer_.size() which is of course out of bounds.

C++ generally uses half-open ranges, i.e. the end iterator points to one element past the end of the sequence.

The only problem I see with this code is that it may fail with Visual Studio debug builds if checked iterators are enabled (https://learn.microsoft.com/en-us/cpp/standard-library/checked-iterators?view=msvc-170). &buffer_[buffer_.size()] should be harmless because the actual item is never dereferenced, but a checked [] operator would complain nevertheless. The code can be rewritten as buffer_.begin() + n.

Anyway, the code should work fine in release builds. Can you post a full stack trace?

I did some napkin math because some values are optimized out and that off by one error was my best guess, gonna recompile in Debug now.. but this is the BT for now:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff76a5463 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff764c120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff76334c3 in __GI_abort () at abort.c:79
#4  0x00007ffff5ed3bb0 in std::__glibcxx_assert_fail
(file=file@entry=0x7fffa6278be0 "/usr/include/c++/14.2.1/bits/stl_vector.h", line=line@entry=1130, function=function@entry=0x7fffa627a7e8 "std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = unsigned char; _Alloc = std::allocator<unsigned char>; reference = unsigned char&; size_type = long unsi"..., condition=condition@entry=0x7fffa62700d4 "__n < this->size()") at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/assert_fail.cc:41
#5  0x00007fffa5ec79c0 in std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (this=<optimized out>, __n=<optimized out>)
at /usr/include/c++/14.2.1/bits/stl_vector.h:1128
#6  std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/14.2.1/bits/stl_vector.h:1128
#7  aoo::SLIP::read_bytes (size=26, this=<optimized out>, buffer=0x7fffa5bff890 "\300/aoo/server/ping") at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/SLIP.hpp:59
#8  aoo::net::client::send_server_message_tcp (this=0x555556255d50, data=<optimized out>, size=<optimized out>) at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/client.cpp:918
#9  0x00007fffa5ebe6e9 in aoo::net::client::send_server_message_tcp (this=0x555556255d50, data=<optimized out>, size=<optimized out>)
at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/client.cpp:222
#10 aoo::net::client::run (this=0x555556255d50) at /usr/src/debug/sonobus/sonobus-1.7.2/deps/aoo/lib/src/client.cpp:222
#11 0x00007fffa607f760 in juce::Thread::createNativeThread(juce::Thread::Priority)::{lambda(void*)#1}::_FUN(void*) [clone .lto_priv.0] ()
at /usr/src/debug/sonobus/sonobus-1.7.2/deps/juce/modules/juce_core/threads/juce_Thread.cpp:98
#12 0x00007ffff76a339d in start_thread (arg=<optimized out>) at pthread_create.c:447
#13 0x00007ffff772849c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thanks! Just as I expected, the crash in the debug build is caused by debug iterators. However, this is not the source of the original crash. I think the quickest solution is to compile with RelWithDebInfo instead of Debug. All assertions will be disabled, but you still get (some) debug symbols. This way we can at least narrow down the actual source of the issue.

Weird, the above stacktrace was with RelWithDebInfo, wich is why some stuff is optimized out.
But I'm compiling this by modifying the aur PKGBUILD which might introduce some weirdness.

That is weird indeed! RelWithDebInfo isn't supposed to enable assertions. I'd suggest to follow the official build instructions.

Found the problem: Arch includes _GLIBCXX_ASSERTIONS by default. Even in release builds. So the crashes might just have been failed asserts.

Adding -Wp,-U_GLIBCXX_ASSERTIONS to the CXXFLAGS should disable the asserts for now.

I will test out the new build and reach out to the aur maintainer in the meantime.

Arch includes _GLIBCXX_ASSERTIONS by default.

Ouch! Good catch! Why do they do this? It's terrible for performance!

&buffer_[buffer_.size()] should be harmless because the actual item is never dereferenced

no, forming a reference to an out-of-bounds element is UB. It doesn't matter if / how the reference is used afterwards.
If you really do want to form a past-the-array address, I think you may be able to with casting to std::uintptr_t and doing arithmetic on it - but ideally just don't?

A &buffer_[std::min(rdhead_ + n1, buffer_.size() - 1)] should do just fine.

EDIT: no it won't due to the last element. Probably best to use std::copy_n here.

Why do they do this?

Because developers don't know the language semantics, evidently.

&buffer_[buffer_.size()] should be harmless because the actual item is never dereferenced

no, forming a reference to an out-of-bounds element is UB

No, forming a reference to one element past the end of the array is completely valid! How else would you implement end() iterators? &buffer_[buffer_.size()] would be 100% valid C code. The only problem really is the debug bound check in the [] operator. (The [] member function does not know that we are immediately taking the address, so it aborts nevertheless.)

Probably best to use std::copy_n here.

Or use buffer_.begin() + n, as I've written above. They really do the same. Here's the implementation of std::copy_n in GCC's libstdc++:

 __copy_n(_RandomAccessIterator __first, _Size __n,
	     _OutputIterator __result, random_access_iterator_tag)
    { return std::copy(__first, __first + __n, __result); }

Yes sorry, from the original I read rdhead_ as being equal to buffer.size(), not rdhead_ + 1.

Either way, _GLIBCXX_ASSERTIONS are meant for production use similar to FORTIFY_SOURCE. RHEL and Fedora have been using it for a long time, Arch and Gentoo (and I think also Suse?) do too. libc++ has also recently adopted a similar setting that will also check operator[] for < size().

Either way, _GLIBCXX_ASSERTIONS are meant for production

It is first and foremost a debug setting. IMO adding range-checks to operator[] by default - even in release builds - breaks C++'s principle of zero-cost abstractions. After all, std::vector already has the at member function... But then again, it's not my business how other people compile my code. If they want these bound checks, so be it.