xebialabs / overthere

Runs something "Over there"

Home Page:http://www.xebialabs.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intermittent Connection timeout on WinRM connections

wmunyan opened this issue · comments

Hello,
I am experiencing some very strange behavior in my program, which needs to create a (somewhat) long-running WinRM connection to a remote Windows box. More often than not, my program executes partially, and then fails, producing the following:

Exception in thread "main" com.xebialabs.overthere.cifs.winrm.WinRmRuntimeIOException: Error when sending request to https://MY-SERVER-NAME:5986/wsman
Request:
<?xml version="1.0" encoding="UTF-8"?>

<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
  <env:Header>
    <a:To xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">https://MY-SERVER-NAME:5986/wsman</a:To>
    <a:ReplyTo xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">
      <a:Address mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:Address>
    </a:ReplyTo>
    <w:MaxEnvelopeSize xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="true">307200</w:MaxEnvelopeSize>
    <a:MessageID xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">uuid:D9CE72E3-E2EE-483D-9BEB-94BF4583FF08</a:MessageID>
    <w:Locale xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="false" xml:lang="en-US"/>
    <p:DataLocale xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd" mustUnderstand="false" xml:lang="en-US"/>
    <w:OperationTimeout xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">PT3600.000S</w:OperationTimeout>
    <a:Action xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing" mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/Receive</a:Action>
    <w:SelectorSet xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">
      <w:Selector Name="ShellId">77098E84-D91C-4E9F-B26C-36997F9F1D7C</w:Selector>
    </w:SelectorSet>
    <w:ResourceURI xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
  </env:Header>
  <env:Body>
    <rsp:Receive xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell">
      <rsp:DesiredStream CommandId="580F404C-D703-4A48-861C-96041F8E19CB">stdout stderr</rsp:DesiredStream>
    </rsp:Receive>
  </env:Body>
</env:Envelope>

Response:
[EMPTY]
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.doSendRequest(WinRmClient.java:435)
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.sendRequest(WinRmClient.java:345)
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.receiveOutput(WinRmClient.java:182)
        at com.xebialabs.overthere.cifs.winrm.CifsWinRmConnection$2.run(CifsWinRmConnection.java:162)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to MY-SERVER-NAME:5986 [MY-SERVER-NAME/MY-SERVER-IP] failed: Connection timed out: connect
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
        at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
        at com.xebialabs.overthere.cifs.winrm.WinRmClient.doSendRequest(WinRmClient.java:414)
        ... 3 more
Caused by: java.net.ConnectException: Connection timed out: connect
        at java.net.DualStackPlainSocketImpl.connect0(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
        ... 12 more

I have tried numerous timeout settings, such as setting:

connectionTimeoutMillis=0
socketTimeoutMillis=0
winrmTimeout=PT3600.000S

System.setProperty("jcifs.smb.client.connTimeout", "1200000")
System.setProperty("jcifs.smb.client.responseTimeout", "1200000")
System.setProperty("jcifs.smb.client.soTimeout", "1200000")

I have also tried configuring WinRM on the target with massive timeouts. Here's the current configuration:

C:\Windows\system32>winrm get winrm/config
Config
    MaxEnvelopeSizekb = 500
    MaxTimeoutms = 600000
    MaxBatchItems = 32000
    MaxProviderRequests = 4294967295
    Client
        NetworkDelayms = 10000
        URLPrefix = wsman
        AllowUnencrypted = true [Source="GPO"]
        Auth
            Basic = true [Source="GPO"]
            Digest = false [Source="GPO"]
            Kerberos = true
            Negotiate = true
            Certificate = true
            CredSSP = false
        DefaultPorts
            HTTP = 5985
            HTTPS = 5986
        TrustedHosts = * [Source="GPO"]
    Service
        RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
        MaxConcurrentOperations = 4294967295
        MaxConcurrentOperationsPerUser = 1500
        EnumerationTimeoutms = 240000
        MaxConnections = 300
        MaxPacketRetrievalTimeSeconds = 240
        AllowUnencrypted = true [Source="GPO"]
        Auth
            Basic = true [Source="GPO"]
            Kerberos = true
            Negotiate = true
            Certificate = false
            CredSSP = false
            CbtHardeningLevel = Relaxed
        DefaultPorts
            HTTP = 5985
            HTTPS = 5986
        IPv4Filter = *
        IPv6Filter = *
        EnableCompatibilityHttpListener = false
        EnableCompatibilityHttpsListener = false
        CertificateThumbprint
        AllowRemoteAccess = true
    Winrs
        AllowRemoteShellAccess = true
        IdleTimeout = 7200000
        MaxConcurrentUsers = 10
        MaxShellRunTime = 2147483647
        MaxProcessesPerShell = 50
        MaxMemoryPerShellMB = 1024
        MaxShellsPerUser = 30

Again, the timeout is random. Sometimes the process makes it all the way through, executing about 400 individual commands on the system. There will be instances where more than 400 will take place, and the connection may need to stay active for hours in order to collect the (sometimes massive) amounts of information it needs. Any thoughts or ideas would be most welcome. Because the exception is random, I am having a hard time determining root cause. Thanks anyone for help!

Cheers,
-Bill M.

Update: I am seeing the following in my logs when the connections are failing:

Connection released: [id: 373][route: {s}->https://MY-SERVER:5986][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]

I feel like the fact that the kept alive, allocated routes, and/or total allocations are all 0 is significant in some way, but i dont know what...

Anyone potentially looking at this issue? It is still occurring, still intermittent, and still baffling.
Cheers,
-Bill M.