ned14 / status-code

Proposed SG14 status_code for the C++ standard

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`status_code` erasure casting of integer value types is broken on BigEndian machines

dkrejsa opened this issue · comments

The following code causes a big-endian 64-bit target (LP64 memory model) to terminate the program using std::terminate():

#include <boost/outcome/experimental/status_result.hpp>

using BOOST_OUTCOME_SYSTEM_ERROR2_NAMESPACE::error;
using BOOST_OUTCOME_SYSTEM_ERROR2_NAMESPACE::errc;

int main ()
    {
    error ec(errc::no_link);
    return 0;
    }

The problem appears to occur because the non-zero errc::no_link value is stored as an 8-byte, big-endian intptr_t value after the domain pointer in the error object, so that the non-zero part of the error code value is stored in the last 4 bytes. However, when the time comes to check that the value represents an error rather than a success case, the _generic_code_domain's _do_failure() method casts the const status_code<void> & code argument as a const generic_code &, and generic_code uses a 4-byte errc value_type member immediately following the domain pointer.

  virtual bool _do_failure(const status_code<void> &code) const noexcept override  // NOLINT
  {
    assert(code.domain() == *this);                                           // NOLINT
    return static_cast<const generic_code &>(code).value() != errc::success;  // NOLINT
  }

So when the value() is computed, it takes the 4 bytes immediately following the domain pointer that were the most-significant 4 bytes of the intptr_t value representing errc::no_link in the error object, rather than the least-significant 4 bytes. Since the most-significant 4 bytes are zero, the value looks like a success value rather than an error code, and the errored_status_code _check() function terminates the process.

Seen using Boost 1.81 on VxWorks, built for LP64 on a Freescale P5020DS board.

Yes, the erasure casting of status_code<erased<...>> seems to have been written with only LittleEndian in mind:

/* erasure_cast performs a bit_cast with additional rules to handle types
of differing sizes. For integral & enum types, it may perform a narrowing
or widing conversion with static_cast if necessary, before doing the final
conversion with bit_cast. When casting to or from non-integral, non-enum
types it may insert the value into another object with extra padding bytes
to satisfy bit_cast's preconditions that both types have the same size. */

Removing these static cast overloads should fix the bug

SYSTEM_ERROR2_TEMPLATE(class To, class From, long = 5)
SYSTEM_ERROR2_TREQUIRES(SYSTEM_ERROR2_TPRED(is_erasure_castable<To, From>::value &&is_static_castable<To, From>::value && (sizeof(To) < sizeof(From)))) constexpr To erasure_cast(const From &from) noexcept { return static_cast<To>(bit_cast<erasure_integer_type<From, To>>(from)); }
SYSTEM_ERROR2_TEMPLATE(class To, class From, int = 5)
SYSTEM_ERROR2_TREQUIRES(SYSTEM_ERROR2_TPRED(is_erasure_castable<To, From>::value &&is_static_castable<To, From>::value && (sizeof(To) > sizeof(From)))) constexpr To erasure_cast(const From &from) noexcept { return bit_cast<To>(static_cast<erasure_integer_type<To, From>>(from)); }

as the ones working with explicit padding would do the right:tm: thing in any case, though I don't know why they were added in the first place (enhanced C++14/17 compat?).

Repro on godbolt, note the

li      r9,67
addi    r4,r1,112
std     r9,120(r1)

in .main vs the load in generic_code::_do_failure

lwz     r3,8(r4)

That code was donated, though as I accepted it, it's on me.

I generally call #error when I write endian specific code and I detect big endian. Not because I don't support it, but because I have zero way of ever testing big endian code.

I've just arrived at the WG21 Varna meeting so there is an excellent chance I'll get this fixed this week. Thanks for reporting this issue, I'll be honest in saying I never expected any of my code to ever run on big endian.

Thank you both! I'll keep an eye on this.

Try that fix and let me know how you get on.

At least the optimized assembly looks like it's correct now:

lis     r9,67       ; r9 = 0x0000'0000'0043'0000
rldicr  r9,r9,16,47 ; r9 = 0x0000'0043'0000'0000
addi    r4,r1,112   ; save address of the "code" parameter for `_do_failure` in the second argument register
std     r9,120(r1)

Hi,
I modified the header boost/outcome/experimental/status-code/config.hpp based upon the changes to
include/status-code/config.hpp in the commits

52692be
1be0aa2

I rebuilt and re-ran on the LP64 Big Endian P5020ds target all the Boost outcome tests from Boost 1.81, including the two tests
experimental-core-outcome-status.test and experimental-core-result-status.test that previously
failed with a call to std::terminate() on this target. All those tests passed. Thanks very much!

Am I correct that the changes in single-header/system_error2-nowindows.hpp, single-header/system_error2.hpp, and test/main.cpp would apply to the stand-alone status-code or outcome/experimental setup and not to a vanilla Boost installation?

Now you've confirmed this fixes things for you, I'll cycle the Outcome release to include this fix, so it should turn up in Standalone Outcome by tomorrow.

There is no reason it should not appear in Boost.Outcome in the next Boost release.

Thanks for reporting this issue, and for testing that the fix works!