p4lang / ptf

Packet Test Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PI nanomsg hangs if the corresponding switch crashes

fruffy opened this issue · comments

I have been successfully using nanomsg instead of virtual interfaces to run simple switch. An example is this PR:
p4lang/p4c#3951
However, there is one problem. If simple_switch crashes, the nn framework seems to get stuck at this point:

self.packet_inject.port_remove(self.port_number)

It tries to remove the port but does not get a response back, presumably because the socket is blocking indefinitely? Is there a straightforward way to fix this?

It's not supported by PTF currently, but I would recommend the gRPC service for injecting packets (https://github.com/p4lang/behavioral-model/blob/main/services/p4/bm/dataplane_interface.proto), instead of the nanomsg socket. You could consider adding that support to PTF...

The nanomsg stuff we use doesn't seem to be actively maintained any more. There are new implementations (both for the core library and the Python module). I don't know if using these would help.

I could think of the following workaround (not guaranteed to work): force delete the socket files after you detect that simple_switch_grpc has crashed. I assume this is an unplanned crash (e.g. segfault), and not a signal we could trap in bmv2 to close the socket before exiting.

Thanks, will try to give the hack a shot.