zeromq / jzmq

Java binding for ZeroMQ

Home Page:http://www.zeromq.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Resource leak when simultaneously creating a socket pair from two threads

sjlnk opened this issue · comments

Hi,

I think this is jzmq specific because I can't reproduce it in C.

When I create simultaneously a socket pair, bind side on one thread and connect side on another thread, I get the context in a state where it cannot clean up resources properly after destroy is called.

Test bench:

  • Linux 3.13
  • libzmq 4.1.4
import org.zeromq.ZContext;
import org.zeromq.ZMQ;
import org.zeromq.ZMQ.Socket;
import java.lang.management.ManagementFactory;
import java.lang.Thread;
import java.lang.InterruptedException;
import java.io.IOException;
import java.net.ServerSocket;

class Main
{

    public static void main(String[] args) throws InterruptedException, IOException
    {
        String pid_info = ManagementFactory.getRuntimeMXBean().getName();
        System.out.println(pid_info);

        for (long i = 0; i < 1000000000; i++)
        {
            final ZContext ctx = new ZContext();

            final String addr = "inproc://test";

            new Thread() {
                public void run()
                {
                    Socket s = ctx.createSocket(ZMQ.PUSH);
                    s.connect(addr);
                }
            }.start();
            new Thread() {
                public void run()
                {
                    try
                    {
                         // make this 10 ms and the resource leak is gone!
                         Thread.sleep(0); 
                    } catch (InterruptedException e) {}
                    Socket s = ctx.createSocket(ZMQ.PULL);
                    s.bind(addr);
                }
            }.start();

            Thread.sleep(30);
            ctx.destroy();
            System.out.print("\r" + i);
            System.out.flush();
        }
    }

}

When you run this code, observe that the process is creating new threads faster than it is destroying them. Also it is opening more file descriptors than it's closing. If you keep it running for a while, JVM crashes when it runs out of resources.

If you open the the sockets at different times (even 10 milliseconds apart from each other) then the problem is gone and resources are not leaked.

This is my attempt in in creating this situation in C:

#include <zmq.h>
#include <error.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>

void ckz(int rc)
{
    if (rc == -1)
    {
        fprintf(stderr, "%s:%d ERROR: %s (%u)\n",
                __FILE__, __LINE__, zmq_strerror(zmq_errno()), zmq_errno());
        exit(1);
    }
}

void ckzp(void* p)
{
    if (!p)
    {
        fprintf(stderr, "%s:%d ERROR: %s (%u)\n",
                __FILE__, __LINE__, zmq_strerror(zmq_errno()), zmq_errno());
        exit(1);
    }
}

void ck(int rc)
{
    if (rc)
    {
        fprintf(stderr, "%s:%d ERROR: \n",
                __FILE__, __LINE__);
        exit(1);
    }
}

void start_thread(void (*f) (void*), void* args)
{
    pthread_t t;
    ck(pthread_create(&t, NULL, f, args));
    ck(pthread_detach(t));
}

const char* addr = "inproc://test";

void pull_thread(void *ctx)
{
    //usleep(1000 * 1);
    void *pull = zmq_socket(ctx, ZMQ_PULL);
    ckzp(pull);
    ckz(zmq_bind(pull, addr));
    ckz(zmq_close(pull));
}

void push_thread(void *ctx)
{
    //usleep(1000 * 100);
    void *push = zmq_socket(ctx, ZMQ_PUSH);
    ckzp(push);
    ckz(zmq_connect(push, addr));
    ckz(zmq_close(push));
}

int main(void)
{
    printf("PID: %d\n", getpid());
    int i;
    for (i = 0; i < 100000000; i++)
    {
        printf("\r%d", i);
        fflush(stdout);

        void *ctx = zmq_ctx_new();
        ckzp(ctx);

        start_thread(pull_thread, ctx);
        start_thread(push_thread, ctx);

        usleep(1000 * 30);

        ckz(zmq_ctx_term(ctx));
    }

    return 0;
}

In C version there is no resource leak. Seems jzmq specific...

ZContext is not actually thread safe. It's not even trying to be. I didn't realize this. Maybe it should be? If it shouldn't be then it should be clearly documented, I think.