Possible memory leak in pf_dcp_identify_req() / pf_dcp_responder()

Question

Possible memory leak in pf_dcp_identify_req() / pf_dcp_responder()

JanVasina opened this issue 2 years ago · comments

Dear friends,

I think there is a potential memory leak in handling dcp requests and sending dcp responses back.
I am speaking about the file pf_dcp.c with the date 2022-05-18 or earlier.

There is a possibility that several DCP request comes very fast in succession. (This is for example case of Netload SL1 test in the Profinet testing suite.) In that case only the first request is handled and the other would become unhandled and the network buffers would remain unfreed.

Detailed description:

The DCP request arrives to the end of the pf_dcp_identify_req(), starting from line 1903, the response is built, and a protection variable dcp_delayed_response_waiting is set:
line 1924: net->dcp_delayed_response_waiting = true;

After that, a scheduler adds a delayed response by calling the pf_dcp_responser() in the future, line 1944:

      (void)pf_scheduler_add (
         net,
         response_delay,
         pf_dcp_responder,
         p_rsp,
         &net->dcp_identresp_timeout);

In the pf_dcp_responser() function, the net->dcp_delayed_response_waiting is tested and only if it is true, the response is sent back and the buffer is freed - lines 187 and so on. Afterwards the protection variable dcp_delayed_response_waiting is cleared.

Now imagine two fast DCP request in quick succession -> the function pf_dcp_identify_req() is called twice and only after that the pf_dcp_responder() is called twice by the scheduler. But because the net->dcp_delayed_response_waiting is cleared at the end of the pf_dcp_responder(), the second call of this function by the scheduler will end immediately in the line 187, the response is not sent to the controller, and what is probably worse, the p_buf is NOT released by pnal_buf_free() function.

I think that the net->dcp_delayed_response_waiting variable is not necessary here and could be omitted completely from the code. Or it could be converted to a volatile counting variable, incremented in pf_dcp_identify_req() and decremented in pf_dcp_responder() and tested for zero value. In that case the maximal possible number of responses will be sent back to the controller(s).

Now imagine a really big and fast number of DCP requests, which could happen during the Netload SL1 test. The pf_dcp_identify_req() is called so many times that the scheduler does not have enough resources to add the pf_dcp_responder() function to its queue. In that case there would be another memory leak, because the p_rsp variable will not be freed after fail of pf_scheduler_add() function.
I think the code on line 1944 should be changed to a condition:

if(pf_scheduler_add (
   net,
   response_delay,
   pf_dcp_responder,
   p_rsp,
   &net->dcp_identresp_timeout) != 0)
{
   pnal_buf_free (p_rsp);
}

I.e. if the scheduler fails, free the response buffer.

With regards

Jan

MathiasPCH · Answer 1 · Tue Dec 13 2022 21:54:02 GMT+0800 (China Standard Time)

Hi JanVasina,

I can confirm that we have recently discovered this memory leak in our device.
This is the same conclusion that I got.

Best regards

Mathias