Undefined behavior in circular buffer
VasyaPRO opened this issue · comments
Circular buffer implementation that involves page mapping mentioned in the recent video (which is a great video btw) behaves inconsistently on different optimization levels, which is likely caused by undefined behavior. The reason is probably because of aliasing rules that modern compilers use aggressively to optimize code. The following code uses the circular buffer defined in perfaware/part3/listing_0121_circular_buffer_main.cpp
:
int main(void)
{
printf("Circular buffer test:\n");
const size_t BUF_SIZE = 64 * 4096;
circular_buffer Circular = AllocateCircularBuffer(BUF_SIZE, 3);
if(IsValid(Circular))
{
u8 *Data = Circular.Base.Data + BUF_SIZE;
Data[0] = 1;
Data[BUF_SIZE] = 2;
printf("%u\n", Data[0]);
DeallocateCircularBuffer(&Circular);
}
else
{
printf(" FAILED\n");
}
// NOTE(casey): Since we do not use these functions in this particular build, we reference their pointers
// here to prevent the compiler from complaining about "unused functions".
(void)&IsInBounds;
(void)&AreEqual;
(void)&AllocateBuffer;
(void)&FreeBuffer;
return 0;
}
This code outputs (which is the expected result) on each compiler with optimizations off (cl /Od
, g++ -O0
, clang++ -O0
):
Circular buffer test:
2
But it gives the following output when optimizations are on (cl /O2
, g++ -O2
, clang++ -O2
):
Circular buffer test:
1
It seems like compilers assume that writing to Data[BUF_SIZE]
could not possibly affect the value of Data[0]
, so it can safely put the known value of Data[0]
directly into printf.
Here is the assembly generated with g++ -O2
(g++ version 13.1, mingw-w64)
140007eba: c6 80 00 00 04 00 01 mov BYTE PTR [rax+0x40000],0x1 ; write 1 to Data[0]
140007ec1: 48 8d 0d 8b 21 00 00 lea rcx,[rip+0x218b]
140007ec8: ba 01 00 00 00 mov edx,0x1 ; put 1 directly into printf args
140007ecd: c6 80 00 00 08 00 02 mov BYTE PTR [rax+0x80000],0x2 ; write 2 to Data[BUF_SIZE]
140007ed4: e8 f7 fd ff ff call 140007cd0 <_Z6printfPKcz> ; call printf
And here is the assembly generated with g++ -O0
140001aec: c6 00 01 mov BYTE PTR [rax],0x1 ; write 1 to Data[0]
140001aef: 48 8b 45 f0 mov rax,QWORD PTR [rbp-0x10]
140001af3: 48 05 00 00 04 00 add rax,0x40000
140001af9: c6 00 02 mov BYTE PTR [rax],0x2 ; write 2 to Data[BUF_SIZE]
140001afc: 48 8b 45 f0 mov rax,QWORD PTR [rbp-0x10]
140001b00: 0f b6 00 movzx eax,BYTE PTR [rax] ; read Data[0] again
140001b03: 0f b6 c0 movzx eax,al
140001b06: 89 c2 mov edx,eax ; put the value of Data[0] into printf args
140001b08: 48 8d 05 6b 85 00 00 lea rax,[rip+0x856b]
140001b0f: 48 89 c1 mov rcx,rax
140001b12: e8 39 68 00 00 call 140008350 <_Z6printfPKcz> ; call printf
Sorry if it's not the right place to disscuss this, but YouTube comments are disabled, and Computerenhance comments are for subscribers only. But I believe it should be mentioned somewhere that this kind of circular buffers are not really safe to use with modern compilers unless someone figures out how to reliably tell the compiler that this kind of page manipulation is involved.
I can certainly add a comment to that effect, although I've never actually seen any cases of actual code you would hand a circular buffer to that do this (in general, if you are writing to more than one buffer's worth of data like this is, then it's unclear what you would want to have happen in the circular buffer case anyway, since the output is larger than the size of the buffer to begin with).
Separately, if you do want this to work for some reason, you can add "volatile" to the pointer so the compiler knows it can't optimize assumed values.
- Casey
Usually when I use this kind of "magic" ringbuffer with virtual memory trick, then I read it only via variable index that compiler does not know anything about compile time, and I'm only reading/writing it forwards. Never with static offsets more than ringbuffer size. So far I have not seen compilers messing up such code.
You can see example of such ringbuffer here in my code here: https://github.com/mmozeiko/wstream/blob/main/rtmp_stream.c#L278-L355
The reader calls RB_BeginRead, gets a pointer, reads what values it wants, and calls RB_EndRead to advance read offset. Similarly with writer calling RB_BeginWrite / RB_EndWrite. Example where write happens: https://github.com/mmozeiko/wstream/blob/main/rtmp_stream.c#L413-L424