thecodeteam / goscaleio

Archived repo for GoScaleIO

Home Page:https://github.com/dell/goscaleio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to start multiple REX-Ray instances on different hosts connecting to ScaleIO

dvonthenen opened this issue · comments

Effectively there is a program that is installing, configuring, and starting rexray all at the same time. The first instance seems to start just fine but every instance after that fails. One problem is that rexray cannot be started until the scaleio gateway is functioning normally since rexray seems to hit the gateway on start. I am pretty sure that is “functions as designed” but the problem then is I need to start all the rexray instances after scaleio gateway is available.

In an attempt to remedy the situation, I added an exponentially increasing back off delay before retrying plus the additional of a random component but starting the services that failed still wont work. The odd thing is once the program gives up after their number of retries, i log into each host and just do a service start and it works without changing a thing.

rexray_works.log - Is the first instance and rexray starts fine.
rexray_failed.log - Is instance #2 and rexray fails
rexray_retry.log - I ssh into instance #2. The first half of the log is the initial attempt above (rexray_failed.log) and I just do a rexray start and the service starts normally.

logs.zip

I found the problem... it looks like because I am doing everything programmatically, the time it takes for starting the gateway versus actually being usable is larger than it takes to install rexray, configure and start it. It looks like this is a non-issue.

@dvonthenen can you implement a health checker in your code? maybe @cduchesne knows of a place to curl and get an expected response if it's up and running?