soabase / exhibitor

ZooKeeper co-process for instance monitoring, backup/recovery, cleanup and visualization.

Home Page:https://groups.google.com/forum/#!topic/exhibitor-users/PVkcd88mk8c

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rolling Ensemble Change not working

PMVDias opened this issue · comments

commented

Hi,

When I apply some changes on the zookeeper config on exhibitor (the UI of the server 1) it applies the chnage on the 1 server and then if just hangs and doesn't aplly them to the other 2 servers. Here is my config file:

Auto-generated by Exhibitor - Wed Dec 09 03:52:39 EST 2015

Wed Dec 09 03:52:39 EST 2015

server.3=10.74.151.174:2888:3888
server.2=10.74.151.173:2888:3888
server.1=10.74.151.231:2888:3888
initLimit=15
syncLimit=5
clientPort=2181
tickTime=2000
dataDir=/opt/zookeeper-3.4.7/Data
dataLogDir=/opt/zookeeper-3.4.7/Data

If I try apply the changes on other one is hangs straight away.
So it looks that it tries to apply the changes on ther server with myid 1 and then to the others, but as it can't connet to the others just hangs!

This is my exhibitor.properties:

Auto-generated by Exhibitor

Tue Dec 08 11:47:08 EST 2015

com.netflix.exhibitor-rolling-hostnames=
com.netflix.exhibitor-rolling.zookeeper-data-directory=/opt/zookeeper-3.4.7/Data
com.netflix.exhibitor-rolling.servers-spec=S:1:10.74.151.231,\nS:2:10.74.151.173,\nS:3:10.74.151.174
com.netflix.exhibitor.java-environment=
com.netflix.exhibitor.zookeeper-data-directory=/opt/zookeeper-3.4.7/Data
com.netflix.exhibitor-rolling-hostnames-index=0
com.netflix.exhibitor-rolling.java-environment=
com.netflix.exhibitor-rolling.observer-threshold=0
com.netflix.exhibitor.servers-spec=S:1:10.74.151.231,\nS:2:10.74.151.173,\nS:3:10.74.151.174
com.netflix.exhibitor.cleanup-period-ms=43200000
com.netflix.exhibitor.auto-manage-instances-fixed-ensemble-size=3
com.netflix.exhibitor.zookeeper-install-directory=/opt/zookeeper-3.4.7
com.netflix.exhibitor.check-ms=30000
com.netflix.exhibitor.zookeeper-log-directory=
com.netflix.exhibitor-rolling.auto-manage-instances=0
com.netflix.exhibitor-rolling.cleanup-period-ms=43200000
com.netflix.exhibitor-rolling.auto-manage-instances-settling-period-ms=30000
com.netflix.exhibitor-rolling.check-ms=30000
com.netflix.exhibitor.log-index-directory=
com.netflix.exhibitor-rolling.log-index-directory=
com.netflix.exhibitor.backup-period-ms=0
com.netflix.exhibitor-rolling.connect-port=2888
com.netflix.exhibitor-rolling.election-port=3888
com.netflix.exhibitor-rolling.backup-extra=
com.netflix.exhibitor.client-port=2181
com.netflix.exhibitor-rolling.zoo-cfg-extra=initLimit=10&syncLimit=5&tickTime=2000
com.netflix.exhibitor-rolling.zookeeper-install-directory=/opt/zookeeper-3.4.7
com.netflix.exhibitor.cleanup-max-files=3
com.netflix.exhibitor-rolling.auto-manage-instances-fixed-ensemble-size=3
com.netflix.exhibitor-rolling.backup-period-ms=0
com.netflix.exhibitor-rolling.client-port=2181
com.netflix.exhibitor.backup-max-store-ms=0
com.netflix.exhibitor-rolling.cleanup-max-files=3
com.netflix.exhibitor-rolling.backup-max-store-ms=0
com.netflix.exhibitor.connect-port=2888
com.netflix.exhibitor.backup-extra=
com.netflix.exhibitor.observer-threshold=0
com.netflix.exhibitor.log4j-properties=
com.netflix.exhibitor.auto-manage-instances-apply-all-at-once=0
com.netflix.exhibitor.election-port=3888
com.netflix.exhibitor-rolling.auto-manage-instances-apply-all-at-once=0
com.netflix.exhibitor.zoo-cfg-extra=initLimit=10&syncLimit=5&tickTime=2000
com.netflix.exhibitor-rolling.zookeeper-log-directory=
com.netflix.exhibitor.auto-manage-instances-settling-period-ms=30000
com.netflix.exhibitor-rolling.log4j-properties=
com.netflix.exhibitor.auto-manage-instances=0

Any idea why?

Thanks,
Pedro

I'm seeing this too. I thought I had this working a few months back. I'm doing this with s3-backed configuration, and I see the lockfile is created. What logging level is appropriate to track down the cause?

We ran into this issue in DC/OS and have a patch in our fork of Exhibitor here: dcos/exhibitor#4 if you want a starting point for fixing this.

@BenWhitehead I know this issue is really old but do you happen to know what the cause of this bug is ? I see in the PR you linked (dcos/exhibitor#4) you added a way to customize the configuration directory of Zookeeper. What does that have to do with this bug?

@xiaochuanyu IIRC it all stems from this call to parseToConfig because the config object produced does not do any validation to ensure that the resulting config is valid, it just trusts blindly the total contents of the PUT to be valid. After receiving the PUT it will iterate over all the property keys it thinks should exist, and trying to "get" them from the json document sent, if the property is not defined it defaults to empty string, this value will then be persisted into the config. When the value of property com.netflix.exhibitor-rolling.zookeeper-config-directory is read, the empty string is read and passed along to zk, resulting in zk trying to use / as it's base directory, which it doesn't have write permission to.

So if you send it a "good" json document everything will work out okay, but if not, the exhibitor config can be left in a bad state incapable of recovering without external intervention.

The fix in dcos/exhibitor#4 updates the UI to include the original value for the zookeeperConfigDirectory in the json document that is PUT back to the server to ensure it's not set to empty string.

Hope this helps.