tidwall / neco

Concurrency library for C (coroutines)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segfault in the HTTP server example code

j4cobgarby opened this issue · comments

To reproduce

  1. Download neco.{c,h} and the example HTTP server code from the README, which is:
#include <stdio.h>
#include <unistd.h>
#include "neco.h"

void request(int argc, void *argv[]) {
    int fd = *(int*)argv[0];
    char req[256];
    int n = neco_read(fd, req, sizeof(req));
    if (n > 0) {
        char res[] = "HTTP/1.0 200 OK\r\n"
                     "Content-Type: text/html\r\n"
                     "Content-Length: 21\r\n"
                     "\r\n"
                     "<h1>Hello Neco!</h1>\n";
        neco_write(fd, res, sizeof(res));
    }
    close(fd);
}

int neco_main(int argc, char *argv[]) {
    int servfd = neco_serve("tcp", "127.0.0.1:8080");
    if (servfd < 0) {
        printf("neco_serve: %s\n", neco_strerror(servfd));
        return 0;
    }
    printf("Serving at http://127.0.0.1:8080\n");
    while (1) {
        int fd = neco_accept(servfd, 0, 0);
        if (servfd < 0) {
            printf("neco_accept: %s\n", neco_strerror(fd));
            continue;
        }
        neco_start(request, 1, &fd);
    }
    return 0;
}
  1. gcc neco.c main.c && ./a.out
  2. It runs fine, and serves the HTML once, but segfaults immediately after.
$ ./a.out
Serving at http://127.0.0.1:8080
Segmentation fault (core dumped)

Debugging

I had a brief look at the trace in GDB:

Serving at http://127.0.0.1:8080
[New Thread 0x7ffff56006c0 (LWP 30212)]
[New Thread 0x7ffff4c006c0 (LWP 30214)]

Thread 2 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff56006c0 (LWP 30212)]
0x00007ffff7c93c34 in pthread_detach@GLIBC_2.2.5 () from /lib64/libc.so.6
(gdb) where
#0  0x00007ffff7c93c34 in pthread_detach@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x0000000000405876 in worker_entry (arg=0x41f9f0) at neco.c:2744
#2  0x00007ffff7c92bb2 in start_thread () from /lib64/libc.so.6
#3  0x00007ffff7d1400c in clone3 () from /lib64/libc.so.6
(gdb) frame 1
#1  0x0000000000405876 in worker_entry (arg=0x41f9f0) at neco.c:2744
2744	                pthread_detach(thread->th);
(gdb) print thread
$1 = (struct worker_thread *) 0x41f9f0
(gdb) print thread->th
$2 = 0

Presumably this thread->th == 0 is the cause of the problem, but I'm not familiar enough with the codebase to have a look into this any further.

Probably worth noting that other examples that I tried so far worked fine. (Nevermind, looks like I'm getting the same issue with the echo server in examples/; it segfaults before I even run the client, in fact.)

Looking a bit further into the source code, I've been able to fix this issue with the following patch:

diff --git a/neco.c b/neco.c
index 322265c..51d8110 100644
--- a/neco.c
+++ b/neco.c
@@ -2739,10 +2739,10 @@ static void *worker_entry(void *arg) {
         ts.tv_sec += 1;
         pthread_cond_timedwait(&thread->cond, &thread->mu, &ts);
         if (thread->len == 0) {
-            thread->th = 0;
             if (!thread->end) {
                 pthread_detach(thread->th);
             }
+            thread->th = 0;
             thread->end = false;
             break;
         }

(i.e., moving the thread->th = 0; until after the call to pthread_detach)

Let me know if this is a "correct" fix, or if the previous order was intended. If so, I'd be happy to submit a PR :)

Your code looks like it will fix an error related to the pthread_deatch being passed a zero value.
But I'm unable to reproduce an actual SEGV crash on my side.
What operating system and architecture are you running?

I'm running Opensuse Tumbleweed on Linux 6.8.1. In terms of architecture, it's some dell laptop with an Intel i7.
It is a strange issue, because clearly the code works fine for you, I just don't understand how -- maybe certain implementations of pthread_detach gracefully handle invalid threads?

I wonder if the difference is due to how our systems are giving IDs to threads. If I run the following code:

#include <pthread.h>
#include <stdio.h>

void *test(void *_) {
return NULL;
}

int main() {
    pthread_t th;
    pthread_create(&th, NULL, &test, NULL);

    printf("Created thread: %lu\n", th);

    pthread_detach(th);
}

I see "Created thread: 140478894835392", but depending on what OS you're on, the first thread could potentially be given the id 0? This would explain why you're not getting a segfault, however I believe in that case it's still incorrect to explicitly set the thread to 0 before detaching it.

I agree that it must be related to a different in how systems handles passing a zero to pthread_detach. I would expect it to return an ESRCH error, and gracefully continue operating. But still, you identified a valid issue with thread being set to zero before detaching. I think that will probably fix the issue.

I would accept a PR if you are willing.

That's a good point, the man page agrees with you that it should return a ESRCH error... However even with something as simple as

#include <pthread.h>

int main() {
    pthread_detach(0);
}

I get a segfault - maybe this is a bug specifically in whatever version of libc I have.

I'll submit a PR in a minute :)

Great thanks! All is merged.

thanks :)