xebialabs / overthere

Runs something "Over there"

Home Page:http://www.xebialabs.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Strange behavior on long running commands

cdsteinf opened this issue · comments

I have a long running python script that has extended periods where nothing is output on stdout or stderr. What I observe is that when using WINRM_INTERNAL (running on linux) with this package is that no http traffic is generated while waiting for the task to complete. I have set very long timeouts on both the server and client size as appropriate and everything appears to "work". My problem appears to be that something in my network is closing sockets that are idle for 1 hour (perhaps a firewall). When I look at a wireshark trace, I see that I do get the http200/Done message generated, but because the socket is closed, it retries for 20 seconds and sends RST and the machine running the Overthere Package never sees it and hangs forever.

When I run the same command using winrs on a windows box - I observe that this tool uses the winrm protocol differently in that it uses a short timeout and will continue looping through http500 "incomplete" errors until the task finishes. When finished, an http200 is returned and processed by winrs correctly because the socket was kept open by all of the http traffic with the http500 messages.

In short - is there a way to have the overthere package to generate some kind of keep-alive in WINRM_INTERNAL mode so that my unique building network does not prematurely close the network connection? If I set a short winrm timeout in overthere, the http500 message is generated, but is not processed in a way to continue and instead overthere throws an exception.

I believe we ran into a similar/same issue.
We set the net.ipv4.tcp_keepalive_time (on the client host) to a value less than the idle timeout on the firewall/gateway. In theory, this should cause the OS to send ACKs once idle.
We have not tested it thoroughly enough to say with 100% certainty, but the limited testing we've done shows promising results.
Hopefully, this can help you @cdsteinf