ros-perception / image_common

Common code for working with images in ROS

Home Page:http://www.ros.org/wiki/image_common

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistent error inside docker

believeInU opened this issue · comments

I've been trying to run a node that uses CameraSubscriber and image_transport Publisher inside a docker container and I got some weird result. The node would sometimes get stuck after printing the line [setParam] Failed to contact master at [localhost:11311]. Retrying... while in other it would have worked just fine, without me changing anything in the system
(in both cases roscore was running in the background with all of its environment variables configured the same).

Furthermore, when I ran this node as part of a bigger launch file (without roscore) I saw that most of the nodes managed to run smoothly while the node I wrote that uses image_transport would sometimes fail with the same error message.

I noticed that the problem only accords when I was declaring an image_transport's Publisher or Subscriber.

For a test I wrote the following node:

#include <ros/ros.h>
#include <image_transport/image_transport.h>

int main(int argc, char **argv)
{
    ros::init(argc, argv, "image_test");
    ros::NodeHandle nh("");

    image_transport::ImageTransport it(nh);

    for (auto i = 0; i < 1000; ++i) {
        ROS_INFO("%d", i);
        auto tmp = it.advertise("image" + std::to_string(i), 1);
    }

    ros::spin();
    return EXIT_SUCCESS;
}

Each time that I ran this code it got stuck after a different amount of publisher it declared. (sometimes after just a few publishers, and sometimes after 800 publishers).
A typical result I got when running the node above was:

[ INFO] [1610372592.047928526]: 0
[ INFO] [1610372592.206534307]: 1
[ INFO] [1610372592.212120537]: 2
...
[ INFO] [1610372595.301301137]: 358
[ INFO] [1610372595.308509332]: 359
[ERROR] [1610372595.310783823]: [setParam] Failed to contact master at [localhost:11311].  Retrying...

When I tried replacing the image_transport's publisher with a normal ros publisher the node managed to run smoothly.
Also when I tried to run the node using GDB debugger The node ran smoothly.

I tried updating the system libraries inside the docker (including image_transport and it's plugins), removing some of the plugins, changing its network settings, but I couldn't find anything that can solve this problem.

Because of the Inconsistency of the problem, I thought that the problem could be related to a race condition somewhere in the program, but I haven't used anything that requires multi-threading that could cause such a thing (except ros itself).

As for my system, I am using ros melodic with its official image as the base for my docker.

If someone could help me understand the source of this problem I would really appreciate this.

Apparently, I had in my CMakeLists.txt these 2 lines for profiling the code that made this problem appire:

add_compile_options(-pg)
set(catkin_LIBRARIES ${catkin_LIBRARIES} -pg)

I deleted these lines and now the node is working fine.

I originally added these lines based on this guild: http://wiki.ros.org/roslaunch/Tutorials/Profiling%20roslaunch%20nodes

I'm not sure why this profiling would conflict with image_transport, but deleting them seems to solve the problem.