jclouds / legacy-jclouds

Home Page:https://jclouds.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RunScriptOnNode is failing sometimes in 1.6.0-rc2 because of wrong IP

jaiganeshm opened this issue · comments

I recently upgraded to 1.6.0 (rc2). I was using runScriptOnNode call on EC2 instance. In some cases, I get the following error.
Apparently, the ssh connection is being tried on the private IP of the instance instead of the public IP.

6:45:40.505 [SimpleAsyncTaskExecutor-1] ERROR SLF4JLogger << (root:rsa[fingerprint(20:05:08:81:8c:01:99:fc:30:29:23:e6:c3:6b:12:42),sha1(64:43:e5:ec:da:8f:a6:88:f9:a2:9b:8c:0e:d9:41:cb:e5:4e:51:41)]@10.195.7.24:22) error acquiring {hostAndPort=10.195.7.24:22, loginUser=root, ssh=null, connectTimeout=7200000, sessionTimeout=7200000} (out of retries - max 7): Exhausted available authentication methods
net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods
at net.schmizz.sshj.userauth.UserAuthImpl.authenticate(UserAuthImpl.java:114) ~[sshj-0.8.1.jar:na]
at net.schmizz.sshj.SSHClient.auth(SSHClient.java:205) ~[sshj-0.8.1.jar:na]
at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:305) ~[sshj-0.8.1.jar:na]
at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:324) ~[sshj-0.8.1.jar:na]
at org.jclouds.sshj.SSHClientConnection.create(SSHClientConnection.java:144) ~[jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.sshj.SSHClientConnection.create(SSHClientConnection.java:40) ~[jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.sshj.SshjSshClient.acquire(SshjSshClient.java:193) [jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.sshj.SshjSshClient.connect(SshjSshClient.java:223) [jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.compute.callables.RunScriptOnNodeUsingSsh.call(RunScriptOnNodeUsingSsh.java:80) [jclouds-compute-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.compute.internal.BaseComputeService.runScriptOnNode(BaseComputeService.java:614) [jclouds-compute-1.6.0-rc.4.jar:1.6.0-rc.4]

More Notes:

The ConcurrentOpenSocketFinder class tries to identify the reachable IP from the two IP's available for the node (Public and Private).
In this case, my local network happened to have the exact IP that amazon generated for the node as its private IP. So the socket connect test to private IP succeeded.
Now , it tried to ssh to it and the ssh failed because of wrong authentication for obvious reasons.

The following method constructs the FluentIterable by first concating the publicAddress. But still the ssh connect was trying to the private IP.
private static FluentIterable checkNodeHasIps(NodeMetadata node) {
FluentIterable ips = FluentIterable.from(concat(node.getPublicAddresses(), node.getPrivateAddresses()));
checkState(size(ips) > 0, "node does not have IP addresses configured: " + node);
return ips;
}

From Adrian:
I think the reason would be evident in the code that calls the method pasted. At any rate, I'd guess it is more about which socket test completed first, given it is in parallel. The code should prefer the local address as that's cheaper in public clouds. Custom routing is possible by making a subclass of this and binding it in a guice module passed to ContextBuilder.modules

More discussions on this is available here
https://groups.google.com/forum/?fromgroups=#!topic/jclouds/TBpDtt9jaTo