Intermittent Connection timeout on WinRM connections
wmunyan opened this issue · comments
Hello,
I am experiencing some very strange behavior in my program, which needs to create a (somewhat) long-running WinRM connection to a remote Windows box. More often than not, my program executes partially, and then fails, producing the following:
Exception in thread "main" com.xebialabs.overthere.cifs.winrm.WinRmRuntimeIOException: Error when sending request to https://MY-SERVER-NAME:5986/wsman
Request:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<a:To xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">https://MY-SERVER-NAME:5986/wsman</a:To>
<a:ReplyTo xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">
<a:Address mustUnderstand="true">http://schemas.xmlsoap.org/ws/2004/08/addressing/role/anonymous</a:Address>
</a:ReplyTo>
<w:MaxEnvelopeSize xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="true">307200</w:MaxEnvelopeSize>
<a:MessageID xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing">uuid:D9CE72E3-E2EE-483D-9BEB-94BF4583FF08</a:MessageID>
<w:Locale xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="false" xml:lang="en-US"/>
<p:DataLocale xmlns:p="http://schemas.microsoft.com/wbem/wsman/1/wsman.xsd" mustUnderstand="false" xml:lang="en-US"/>
<w:OperationTimeout xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">PT3600.000S</w:OperationTimeout>
<a:Action xmlns:a="http://schemas.xmlsoap.org/ws/2004/08/addressing" mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/Receive</a:Action>
<w:SelectorSet xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd">
<w:Selector Name="ShellId">77098E84-D91C-4E9F-B26C-36997F9F1D7C</w:Selector>
</w:SelectorSet>
<w:ResourceURI xmlns:w="http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd" mustUnderstand="true">http://schemas.microsoft.com/wbem/wsman/1/windows/shell/cmd</w:ResourceURI>
</env:Header>
<env:Body>
<rsp:Receive xmlns:rsp="http://schemas.microsoft.com/wbem/wsman/1/windows/shell">
<rsp:DesiredStream CommandId="580F404C-D703-4A48-861C-96041F8E19CB">stdout stderr</rsp:DesiredStream>
</rsp:Receive>
</env:Body>
</env:Envelope>
Response:
[EMPTY]
at com.xebialabs.overthere.cifs.winrm.WinRmClient.doSendRequest(WinRmClient.java:435)
at com.xebialabs.overthere.cifs.winrm.WinRmClient.sendRequest(WinRmClient.java:345)
at com.xebialabs.overthere.cifs.winrm.WinRmClient.receiveOutput(WinRmClient.java:182)
at com.xebialabs.overthere.cifs.winrm.CifsWinRmConnection$2.run(CifsWinRmConnection.java:162)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to MY-SERVER-NAME:5986 [MY-SERVER-NAME/MY-SERVER-IP] failed: Connection timed out: connect
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at com.xebialabs.overthere.cifs.winrm.WinRmClient.doSendRequest(WinRmClient.java:414)
... 3 more
Caused by: java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:79)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
... 12 more
I have tried numerous timeout settings, such as setting:
connectionTimeoutMillis=0
socketTimeoutMillis=0
winrmTimeout=PT3600.000S
System.setProperty("jcifs.smb.client.connTimeout", "1200000")
System.setProperty("jcifs.smb.client.responseTimeout", "1200000")
System.setProperty("jcifs.smb.client.soTimeout", "1200000")
I have also tried configuring WinRM on the target with massive timeouts. Here's the current configuration:
C:\Windows\system32>winrm get winrm/config
Config
MaxEnvelopeSizekb = 500
MaxTimeoutms = 600000
MaxBatchItems = 32000
MaxProviderRequests = 4294967295
Client
NetworkDelayms = 10000
URLPrefix = wsman
AllowUnencrypted = true [Source="GPO"]
Auth
Basic = true [Source="GPO"]
Digest = false [Source="GPO"]
Kerberos = true
Negotiate = true
Certificate = true
CredSSP = false
DefaultPorts
HTTP = 5985
HTTPS = 5986
TrustedHosts = * [Source="GPO"]
Service
RootSDDL = O:NSG:BAD:P(A;;GA;;;BA)(A;;GR;;;IU)S:P(AU;FA;GA;;;WD)(AU;SA;GXGW;;;WD)
MaxConcurrentOperations = 4294967295
MaxConcurrentOperationsPerUser = 1500
EnumerationTimeoutms = 240000
MaxConnections = 300
MaxPacketRetrievalTimeSeconds = 240
AllowUnencrypted = true [Source="GPO"]
Auth
Basic = true [Source="GPO"]
Kerberos = true
Negotiate = true
Certificate = false
CredSSP = false
CbtHardeningLevel = Relaxed
DefaultPorts
HTTP = 5985
HTTPS = 5986
IPv4Filter = *
IPv6Filter = *
EnableCompatibilityHttpListener = false
EnableCompatibilityHttpsListener = false
CertificateThumbprint
AllowRemoteAccess = true
Winrs
AllowRemoteShellAccess = true
IdleTimeout = 7200000
MaxConcurrentUsers = 10
MaxShellRunTime = 2147483647
MaxProcessesPerShell = 50
MaxMemoryPerShellMB = 1024
MaxShellsPerUser = 30
Again, the timeout is random. Sometimes the process makes it all the way through, executing about 400 individual commands on the system. There will be instances where more than 400 will take place, and the connection may need to stay active for hours in order to collect the (sometimes massive) amounts of information it needs. Any thoughts or ideas would be most welcome. Because the exception is random, I am having a hard time determining root cause. Thanks anyone for help!
Cheers,
-Bill M.
Update: I am seeing the following in my logs when the connections are failing:
Connection released: [id: 373][route: {s}->https://MY-SERVER:5986][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]
I feel like the fact that the kept alive, allocated routes, and/or total allocations are all 0 is significant in some way, but i dont know what...
Anyone potentially looking at this issue? It is still occurring, still intermittent, and still baffling.
Cheers,
-Bill M.