xdp-project / xdp-tools

Utilities and example programs for use with XDP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fill ring: write from another process

nunojpg opened this issue · comments

Hello,

Note, that the rings are single-producer single-consumer, so do not try to access them from multiple processes at the same time.

I've tried. Sorry. But it doesn't seem to be possible.

I have a application where I want to transfer responsibility to use the FILL RING from one process to another, but I can't remap the virtual memory.

  • It's read only, so only FILL/RX rings.
  • The system might have only 1 queue available on hardware, so this only for 1 interface and 1 queue tupple.
  • I could duplicate the socket in another process (SYS_pidfd_getfd) or just create a new one bound to the same tupple (xdp_shared_umem_fd).
  • I can share UMEM with shared memory.
  • If I use 2 sockets I can have independent RX RINGS.
  • But I can only have one FILL ring which is mapped into the first process virtual memory.
  • The kernel source makes it impossible to map again the FILL ring (xs->fq_tmp = NULL;)

I am looking for a solution or miracle to be able to remap the FILL ring from process A to process B.

The option to destroy and bind a new socket would be ideally avoided to avoid packet loss.

That is a new use case that I at least did not envision. Can you mmap the fill ring in both processes before you bind it in the first process? This should be possible according to the code. mmap:ing the rings after the bind is not allowed.

Thanks for your time.

mmap before bind is not possible since the processes will be new versions for hot swap, so they don't really exist at time of bind.

I think a more generic mechanism would be a some kind of "bind-replace". A flag on bind which would unbind the previous socket into the same interface-queue tupple. The API I can suggest at first attempt is:

Similar to .sxdp_shared_umem_fd add the flag .sxdp_unbind_fd, which would be used by Process B.
After the bind call with .sxdp_unbind_fd the new socket becomes active.
The previous socket will not receive any more frames but can still consume the ones at the RX ring.
Similar to .need_wakeup there would be a .unbinded flag on the FILL ring to signal Process A that no more frames will be received.

Would it be sufficient if it was possible to mmap rings multiple times? Thinking you could just mmap the rings from the second process. All the synchronization needed between the two processes is application logic to me. Do not think we should complicate the xsk kernel code with it.

The current code forbids the second mmap after bind and the reason for this was just to make sure that users did not try to do anything silly. But if you know what you are doing and synchronize things properly, it should work.

From net/xdp/xsk.c xsk_mmap():

if (READ_ONCE(xs->state) != XSK_READY)
return -EBUSY;

This statement would have to be modified. XSK_READY and XSK_BOUND should be allowed. Then you would also have to modify the reading of fq_tmp and cq_tmp. If one of them is NULL, try reading fq and cq respectively instead. Maybe you could experiment with it and see if it works?

So yes, that is the direct way of fixing it for exactly what I want. I gave another suggestion before because I tought maybe a more generic mechanism would be more easy to get accepted.

I have tested this to work for me:

@@ -1056,7 +1056,8 @@ static int xsk_mmap(struct file *file, struct socket *sock,
        unsigned long pfn;
        struct page *qpg;
 
-       if (READ_ONCE(xs->state) != XSK_READY)
+       int state = READ_ONCE(xs->state);
+       if (state != XSK_READY && state != XSK_BOUND)
                return -EBUSY;
 
        if (offset == XDP_PGOFF_RX_RING) {
@@ -1067,9 +1068,9 @@ static int xsk_mmap(struct file *file, struct socket *sock,
                /* Matches the smp_wmb() in XDP_UMEM_REG */
                smp_rmb();
                if (offset == XDP_UMEM_PGOFF_FILL_RING)
-                       q = READ_ONCE(xs->fq_tmp);
+                       q = state == XSK_READY ? READ_ONCE(xs->fq_tmp) : READ_ONCE(xs->pool->fq);
                else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING)
-                       q = READ_ONCE(xs->cq_tmp);
+                       q = state == XSK_READY ? READ_ONCE(xs->cq_tmp) : READ_ONCE(xs->pool->cq);
        }
 
        if (!q)

Let me know if I should post to he mailing list.