Concurrent polling of coro::net::tcp_client is not safe

Question

Concurrent polling of coro::net::tcp_client is not safe

ripose-jp opened this issue 7 months ago · comments

I'm writing a system where a coro::net::tcp_server listens for incoming connections and does two things when receiving an incoming connection:

Create a coroutine that calls forever calls poll(coro::poll_op::read), recv(), and puts structured data in a ring_buffer. This coroutine lives in its own task_container.
Saves the connection in a member vector so that another coroutine can send data by calling poll(coro::poll_op::write) and send() when outgoing data is ready.

The issue arises at the poll call. I believe that coro::net::tcp_client both share an epoll file descriptor, so when two coroutines concurrently try to poll for both EPOLLIN and EPOLLOUT separately, it is possible for epoll_ctl to fail with EEXIST right here.

I don't think that epolling for data concurrently should cause error conditions, but I'm having trouble thinking of ways to fix the current API design.

My first though was separating poll_status::event into poll_status::read and poll_status::write so that two coroutines could concurrently poll. If a reader coroutine gets a poll_status::write it can ignore it and vice versa. The problem with this approach is after tcp_client::poll() returns, I believe it unregisters that file descriptor from epoll which will leave the other coroutine in an undefined state. Some sort of ref count would need to be included in poll_info or something to prevent this. It is also a breaking API change.

Currently I've modified this code which seems to fix the issue in the case of an coro::io_scheduler using inline task processing. I have tested it using a threadpool yet. Regardless, I have misgivings about it's correctness, so I'm interested in finding a better solution to the problem.

if (epoll_ctl(m_epoll_fd, EPOLL_CTL_ADD, fd, &e) == -1)
{
    if (errno != EEXIST || epoll_ctl(m_epoll_fd, EPOLL_CTL_MOD, fd, &e) == -1)
    {
        std::cerr << "epoll ctl error on fd " << fd << "\n";
    }
}

Answer 1 · Tue Dec 19 2023 11:56:05 GMT+0800 (China Standard Time)

Here's a "minimal" example I could create.

#include <coro/coro.hpp>
#include <iostream>

int main()
{
    auto scheduler = std::make_shared<coro::io_scheduler>(coro::io_scheduler::options{
        .execution_strategy = coro::io_scheduler::execution_strategy_t::process_tasks_inline
    });

    constexpr size_t REQUEST_SIZE{100 * 1024 * 1024};

    auto make_bogus_read_task = [](std::shared_ptr<coro::net::tcp_client> client) -> coro::task<void>
    {
        using namespace std::chrono_literals;
        co_await client->poll(coro::poll_op::read, 10s);
    };

    auto make_server_task = [&]() -> coro::task<std::string>
    {
        coro::net::tcp_server server{scheduler};

        co_await scheduler->schedule();

        auto poll_status = co_await server.poll();
        if (poll_status != coro::poll_status::event)
        {
            co_return {};
        }

        std::shared_ptr<coro::net::tcp_client> client =
            std::make_shared<coro::net::tcp_client>(server.accept());

        if (!client->socket().is_valid())
        {
            co_return {};
        }

        scheduler->schedule(make_bogus_read_task(client));

        std::string request(REQUEST_SIZE, 0xAA);
        std::span<const char> remaining{request};
        do
        {
            auto [send_status, r] = client->send(remaining);
            if (send_status != coro::net::send_status::ok)
            {
                co_return {};
            }

            if (r.empty())
            {
                break;
            }

            remaining    = r;
            auto pstatus = co_await client->poll(coro::poll_op::write);
            if (pstatus != coro::poll_status::event)
            {
                co_return {};
            }
        } while (true);

        co_return request;
    };

    auto make_client_task = [&]() -> coro::task<std::string>
    {
        co_await scheduler->schedule();

        coro::net::tcp_client client{scheduler};

        auto status = co_await client.connect();
        if (status != coro::net::connect_status::connected)
        {
            co_return {};
        }

        std::string response(REQUEST_SIZE, '\0');
        std::span<char> remaining{response};
        do
        {
            auto poll_status = co_await client.poll(coro::poll_op::read);
            if (poll_status != coro::poll_status::event)
            {
                co_return {};
            }

            auto [recv_status, r] = client.recv(remaining);
            if (recv_status != coro::net::recv_status::ok)
            {
                co_return {};
            }
            remaining = remaining.subspan(r.size_bytes());
        } while (!remaining.empty());

        co_return response;
    };

    auto result = coro::sync_wait(
        coro::when_all(make_server_task(), make_client_task())
    );

    std::string request = std::move(std::get<0>(result).return_value());
    std::string response = std::move(std::get<1>(result).return_value());
    if (request.empty())
    {
        std::cout << "Request was empty\n";
    }
    else if (response.empty())
    {
        std::cout << "Response was empty\n";
    }
    else if (request != response)
    {
        std::cout << "Request != Response\n";
    }
    else
    {
        std::cout << "Request == Response\n";
    }
}

Josh Baldwin · Answer 2 · Tue Dec 19 2023 13:41:06 GMT+0800 (China Standard Time)

I think I understand, it is two coroutines trying to register a read and a write concurrently, and since once already register the second one fails, correct? I hadn't really envisioned this use case, I always pretty much assumed one thread would "own" the socket or file descriptor and issue the read or write as they are needed (or both with the read write flag).

To allow for this behavior libcoro could track the state of each file descriptor and its current epoll state and check that before it goes to call epoll_ctl. I had thought of this before but didn't do it since it incurs a lookup cost to see what it is. On the flipside as you noticed the current implementation always de-registers unconditionally which is another cost since its a kernel call. I think a more shared state is possibly the right way to solve this, but it will require some synchronization in the io_scheduler to do so.

issue Read|Write|Read+Write
Check in io_scheduler's file descriptor state if it is already registered.
If already registered epoll_ctl_mod the new state in if it isn't what is already registered
If it isn't registered then epoll_ctl_add as it does today
Don't epoll_ctl_del unconditionally, add a hook into the coro::net::socket class that triggers the io_scheduler to epoll_ctl_del when the class destructs, this should prevent the need to unconditionally epoll_ctl_del when the poll_info is done

These are my initial thoughts on possibly fixing. I'll give it some more thought though since this is a use case I hadn't really envisioned yet.

Answer 3 · Tue Dec 19 2023 14:53:18 GMT+0800 (China Standard Time)

I think I understand, it is two coroutines trying to register a read and a write concurrently, and since once already register the second one fails, correct?

I believe that's the issue since the code works perfectly without this line

scheduler->schedule(make_bogus_read_task(client));

The bullet points sound correct, though it would be useful if poll only returned coro::poll_status::event in the case that the event it was actually waiting for occurred. For example EPOLLIN | EPOLLOUT is set, but one thread is waiting on poll(coro::poll_op::read) and other is waiting on poll(coro::poll_op::write).

I'll have some time to kill tomorrow, so I might work on a fix based on your outline. Thanks for the detailed and quick reply.

Josh Baldwin · Answer 4 · Tue Dec 19 2023 21:45:12 GMT+0800 (China Standard Time)

Another option:

"What happens if you register the same file descriptor on an
epoll instance twice?

      You will probably get EEXIST.  However, it is possible to add
      a duplicate ([dup(2)](https://man7.org/linux/man-pages/man2/dup.2.html), [dup2(2)](https://man7.org/linux/man-pages/man2/dup2.2.html), [fcntl(2)](https://man7.org/linux/man-pages/man2/fcntl.2.html) F_DUPFD) file
      descriptor to the same epoll instance.  This can be a useful
      technique for filtering events, if the duplicate file
      descriptors are registered with different events masks."

So it might be possible for you to dup your file descriptor and register the read and write in the two coroutines without any libcoro modifications.

Its probably still worthwhile for the io_scheduler to track polls at the event level and allow for your use case without dup, but I'd be curious if this works around the problem?

Answer 5 · Wed Dec 20 2023 07:12:46 GMT+0800 (China Standard Time)

It sounds like it might solve the problem so long as fd1 and fd2 don't share event masks. I don't have access to a Linux right now, so I can't actually build this code, but here's a patch that might solve the issue. The idea is to undelete the copy constructor of tcp_client and have it dup the file descriptor for the socket. This solution solves the issue of two coroutines sharing a tcp_client since a shared_ptr no longer needs to be allocated. Though it does decrease safety since it makes it easier for a user to do multiple concurrent reads or multiple concurrent writes, which will never be safe. It also makes it easier to accidentally copy client when you don't mean to.

diff --git a/include/coro/net/tcp_client.hpp b/include/coro/net/tcp_client.hpp
index 1d91ac2..2f7b1a4 100644
--- a/include/coro/net/tcp_client.hpp
+++ b/include/coro/net/tcp_client.hpp
@@ -53,9 +53,9 @@ public:
             .ssl_ctx = nullptr
 #endif
         });
-    tcp_client(const tcp_client&) = delete;
+    tcp_client(const tcp_client&);
     tcp_client(tcp_client&& other);
-    auto operator=(const tcp_client&) noexcept -> tcp_client& = delete;
+    auto operator=(const tcp_client&) noexcept -> tcp_client&;
     auto operator=(tcp_client&& other) noexcept -> tcp_client&;
     ~tcp_client();
 
diff --git a/src/net/tcp_client.cpp b/src/net/tcp_client.cpp
index 4ce70b3..46b055d 100644
--- a/src/net/tcp_client.cpp
+++ b/src/net/tcp_client.cpp
@@ -32,6 +32,18 @@ tcp_client::tcp_client(std::shared_ptr<io_scheduler> scheduler, net::socket sock
     m_socket.blocking(coro::net::socket::blocking_t::no);
 }
 
+tcp_client::tcp_client(const tcp_client& other)
+    : m_io_scheduler(other.m_io_scheduler),
+      m_options(other.m_options),
+      m_socket(dup(other.m_socket.native_handle())),
+      m_connect_status(other.m_connect_status)
+#ifdef LIBCORO_FEATURE_SSL
+      ,
+      m_ssl_info(other.m_ssl_info)
+#endif
+{
+}
+
 tcp_client::tcp_client(tcp_client&& other)
     : m_io_scheduler(std::move(other.m_io_scheduler)),
       m_options(std::move(other.m_options)),
@@ -59,6 +71,21 @@ tcp_client::~tcp_client()
 #endif
 }
 
+auto tcp_client::operator=(const tcp_client& other) noexcept -> tcp_client&
+{
+    if (std::addressof(other) != this)
+    {
+        m_io_scheduler   = other.m_io_scheduler;
+        m_options        = other.m_options;
+        m_socket         = {dup(other.m_socket.native_handle())};
+        m_connect_status = other.m_connect_status;
+#ifdef LIBCORO_FEATURE_SSL
+        m_ssl_info       = other.m_ssl_info;
+#endif
+    }
+    return *this;
+}
+
 auto tcp_client::operator=(tcp_client&& other) noexcept -> tcp_client&
 {
     if (std::addressof(other) != this)

I'll mess around with some ideas for the first solution you mentioned as well.

Josh Baldwin · Answer 6 · Thu Dec 21 2023 00:17:39 GMT+0800 (China Standard Time)

I think we should maybe just make the socket class copy-able and it calls dup? Then each class that uses it can leverage that to make itself also copyable

Josh Baldwin · Answer 7 · Thu Dec 21 2023 01:15:51 GMT+0800 (China Standard Time)

I'm going to try and roll the socket copy-able change into this: #216

I've been mulling on cleaning up the network stuff for a bit, so I think it'll naturally fit in here.

Josh Baldwin · Answer 8 · Thu Dec 21 2023 05:58:44 GMT+0800 (China Standard Time)

I've merged the network layout refactor (sorry it'll probably require you to update your namespaces but I think its cleaner for the tls section at least).

I've also added a test #226 in a new PR if you want to read through it and make sure it is solving the behavior you were trying to use.

Answer 9 · Thu Dec 21 2023 10:41:28 GMT+0800 (China Standard Time)

Thanks for the quick work. I had written a patch earlier today based on your copyable socket idea and it seemed to work. I tested the latest 4f3684b and everything worked the same after changing the namespaces. All in all LGTM.

Regarding #226, the test looks fine in principle, but you have to send more data to get the problem to occur. I only noticed the issue when sending ~4MB or more of data. I'm sure this size is generally OS dependent though.

Josh Baldwin · Answer 10 · Thu Dec 21 2023 22:37:10 GMT+0800 (China Standard Time)

I had assumed the larger amount was to make sure they both triggered at the same time, so I added a wait to make sure the bogus read was for sure registered in epoll, if I remove the wait it doesn't trigger consistently.

Getting some weird linker error on the PR though so I'll need to work through that before I merge it. Glad the copyable socket is working for you though.