hep-gc / shoal

A squid cache publishing and advertising tool designed to work in fast changing environments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Redundancy of Shoal servers

rptaylor opened this issue · comments

What would be required to run multiple Shoal servers in redundancy?
Currently if the Shoal server is unavailable, the Shoal client can just fall back to a default list of squids, so the worst case scenario isn't very bad. Regardless, the motivation for using Shoal is somewhat reduced if there is a SPOF.

I suppose additional development would be needed in the clients and agents, but possibly not the servers. How difficult would it be?

It would be a minor change in the client and quite a bit more complicated in the agent. You could just add a backup server to the client config and if the rest request fails catch it in an error block and configure the second server and try again.

The agent you could do something similar but the issue is finding out when the shoal server is unavailable. Since it uses the AMQP it will happily send messages to the queue as long as the rabbitmq-server is active. So if the shoal-server dies but the amqp server is still good to go the agent has no idea its not really doing anything productive. If the whole vm/server hosting everything goes down its easy to detect but if its just shoal-server then something else would be needed.

The shoal-agent is nice since all communication is one way but in doing so it lacks any sort of health checking. Some sort of callback or other detection method would be required to switch to a different shoal-server.

Alternatively (and perhaps better), the agent could be changed to advertise to multiple shoal-servers eliminating the need for context switching and it can happily throw heartbeat messages down the pipes to the two servers (resulting in twice as many messages sent but they are pretty light weight and distributed).

Yes, it would probably be better to have just a list of servers, and agents and clients would treat each one equally, instead of active/passive. Each server would be standalone and stateless.

There probably shouldn't be a need for server discovery though. They could just be statically defined.

Ian described a way to do this without any changes to Shoal: just set up some HA AMQP servers and put several shoal servers behind a load balancer, connecting to the AMQP cluster. That approach should be fine if needed.