tidwall / neco

Concurrency library for C (coroutines)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Undefined behavior detected with mismatched alignments

deckarep opened this issue · comments

Hi @tidwall,

I'm the one who posted the Raylib bunnymark X (Twitter) post and tagged you and also previously asked about the "waiting" for coroutines to start.

I may have detected some undefined behavior (due to casting internal to neco). Zig builds with the sanitizer (llvm) enabled for debug builds. It also adds some other safety checks not found in C-based code so it tends to be more aggressive catching issues.

I was able to discover that the following types do not match with respect to alignment.

typedef struct { char _[48]; } neco_waitgroup;

vs.

struct neco_waitgroup {
    int64_t rtid;
    int count;
    struct colist queue;
};

The good news is you have a static_assert safety check ensuring the sizes are at least big enough:

static_assert(sizeof(neco_waitgroup) >= sizeof(struct neco_waitgroup), "");

But, the alignments do not match at least on my architecture: MacOS intel x86-64 therefore the following will error out.

static_assert(neco_waitgroup == _Alignof(struct neco_waitgroup), "Alignment mismatch");

The opaque type in the neco.h file has an alignment of 1 and the internal type has an alignment of 16.

Adding this additional safety check trips the compiler to catch the issue:

static_assert(sizeof(neco_waitgroup) >= sizeof(struct neco_waitgroup), "");
static_assert(_Alignof(neco_waitgroup) == _Alignof(struct neco_waitgroup), "Alignment mismatch");

And finally forcing alignment specifically on the opaque type fixes the issue:

typedef struct { _Alignas(16) char _[48]; } neco_waitgroup;

In your codebase, that are other opaque types exported...I know that the neco_cond opaque likely has the same issue because I failed to get that working as well. Oddly these issues have been showing up for me sporadically...sometimes the code compiles and runs and other times it does not. The issue likely being that as I edit my code...sometimes the alignment for these types naturally falls on a 16-byte boundary. Go figure!

Let me know your thoughts...I think my proposed fixes and safety checks will help catch future issues like this.

Cheers!

-@deckarep

Nice catch. Your fix should work.
There are only three opaque types neco_waitgroup, neco_cond, and neco_mutex.

I'll post a PR in a bit.