litaio / lita-hipchat

A HipChat adapter for Lita.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Auto reconnect capability

shusugmt opened this issue · comments

It would be nice if it reconnects automatically when connection to HipChat has lost.

I run my bot on my laptop. Once I close my PC, lita never reconnects automatically so I need to stop and launch it again every time, which is a bit annoying.

I've written a poc. @jimmycuadra before I update specs, can you see my implementation?
shusugmt/lita-hipchat@0d9369d

Thanks for the POC. I think trying to manage reconnection within the client like this is probably a little too complex. What do you think about making Lita crash (or just calling shut_down explicitly) when the connection is lost? Then it can easily be restarted by the user's preferred process manager/init system. I like the idea of just letting it crash and then starting up again with a totally clean slate.

@jimmycuadra Thank you. Your idea sounds better. I'll write a new poc on other branch.

Just an opinion, but it feels to me like you're hitting a nail with a sledgehammer by crashing lita. What if lita is using more than 1 adapter, like slack and hipchat? In this case, if hipchat goes down for more than a few seconds, lita will be in a join/leave cycle in slack until hipchat comes back up.

Also, many process managers will only attempt to restart a service a finite number of times before it gives up. If hipchat were down for long enough, it wouldn't ever be restarted if the process manager gave up.

A single Lita process can't be used with more than one adapter, so that is not a problem in this case. I'm not aware of a process manager that can't keep attempting to restart a process infinitely – do you have examples? What would you suggest rather than letting the process exit?

While I'm not new to relying on external services from long running processes, I'm really new to lita. Thanks for pointing out the single adapter thing.

systemd will often use restart=on-failure, which if lita exits with status 0, it wouldn't restart the service. In upstart, there's respawn-limit.

As to how to deal with it, I'd think that applying graceful degradation similar to the way @s2ugimot did in shusugmt/lita-hipchat@0d9369d would require less work from the sysadmin and would ultimately be a more "set-it-and-forget-it" approach.

Hubot does it similarly: hipchat/hubot-hipchat#174

BTW: didn't get to thank you for your ChatOps panel at DevOpsDaysRox. I was the one in the openspace that talked about the "clock in" feature that was the killer feature that got buy-in from the users. I've been working on replacing Hubot with Lita this week and have been really happy so far.

Interesting... thanks for the additional details. I'm not sure how I feel about where/how this issue should be handled. I'll keep thinking about it.

Glad you've been enjoying Lita so far! Getting feedback about it that can lead to more improvements is the best thanks I could ask for. :}

Another point to consider - a process supervisor is not a given. sysvinit is still widely used.

I also imagine how I might feel if a Java or Rails app I manage simply exited because a 3rd party service went away for a moment and it didn't try to reconnect.

Thanks for continuing to think about this.

@jimmycuadra I'm not a huge fan of this approach. If the websocket is closed, its an error? We should handle errors right?

Would you be adverse to some retry logic that is configurable, perhaps an exponential backoff?

lita-hipchat doesn't use WebSockets, but in any case, what type of error are you referring to? If there's something specific that is recoverable, then probably yes. I'm still hesitant about adding reconnection logic inside the adapter. I'd love to hear more specifics on how letting a process manager restart it has been a problem for folks.

@jimmycuadra Do you have any examples of other applications that take this approach? I'm not sure you'll get many specifics on how it's worked for people because I'm not sure it's a common thing to do. Currently we use CentOS 6 and sysvinit so there is no process supervisor. We don't have any processes that routinely bomb out when external services fail and I would call it a bug if they did.

I guess I'm amenable to this feature after more thought (thank you for all the comments). I don't love the idea of having to maintain it, but clearly there is demand and the goal is to make users' lives easier. I'm really just kind of gun shy about handling additional complexity in the adapter since XMPP and the xmpp4r library are already quite nasty, and I don't want to introduce things that might create more subtle bugs in the future. But if someone wants to take this on, I will accept a really solid PR. I think it will need to handle reconnection with an exponential backoff and a maximum number of attempts to be safe.

We're running lita in kubernetes with built in process monitoring. We have other processes running under God and even custom scripts. While I appreciate the views about it reconnecting. I also like simplicity. I think process monitoring isn't that unusual and would suggest running lita under something even if it did handle this reconnect issue. Afterall, this isn't only reason lita might fail and process monitors have a rich feature set for dealing with these events. So practically, if making lita crash is an easy development task, I'd welcome it as an acceptable solution that moves lita forward. When/if a pull request comes with reconnect feature, I'll heed Jimmy's caution and let it bake a while before upgrading. Till then, it's manual intervention every time Hipchat's server's have a blip. Thanks for everyone who's contributed to lita.