Managing system restarts from within the components

Question

Managing system restarts from within the components

vspinu opened this issue 7 years ago · comments

Hi,

What is the idiomatic way to manage faults and restarts from within the system? Let's say I have a web socket and a couple of components that depend on that socket:

(def config
  {:feed/db {:name :blabla
             :foo (atom nil)
             :bar (atom nil)}}
  :feed/ws {:url "wss:/ws-feed.example.com"}
  :periodic/ping {:feed (ig/ref :feed/ws)
                  :period 1000
                  :db (ig/ref :feed/db)}
  :periodic/balances {:feed (ig/ref :feed/ws)
                      :period 60000
                      :db (ig/ref :feed/db)})

When web socket :feed/ws breaks for external reasons I would like to automatically restart it and all its dependencies. Thanks.

Vitalie Spinu · Answer 1 · Thu May 18 2017 18:17:34 GMT+0800 (China Standard Time)

The above use case is not limited to the components restarting themselves. For example, a watchdog component that checks the system health should be able to restart parts of the system as needed. Thus, such components must be able to access the system var somehow. How would you go about implementing this?

James Reeves · Answer 2 · Thu May 18 2017 22:58:57 GMT+0800 (China Standard Time)

There are two broad solutions to this.

One solution is that the component itself can take care of restarting connections. A common example of this is a SQL connection pool. If a connection is broken, the pool will give the user a new connection. All I/O connections should be wrapped in boundaries anyway, both to allow easier testing and to loosen the coupling between the connection and the rest of your code.

Another solution is to introduce a watchdog component and pass in the components you want to watch as references. For example:

{:example/server {:port 8080}
 :example/watchdog #{#ig/ref :example/server}}

Perhaps the server adheres to a protocol that allows it to be restarted.

Alternatively, and usually preferably, this can be done at a system level. If something goes wrong and we lose a connection or the server goes down, we can just log the problem then exit the application with a failure code. If we're using something like systemd to manage our application, it will be restarted automatically.

Vitalie Spinu · Answer 3 · Fri May 19 2017 00:43:23 GMT+0800 (China Standard Time)

All I/O connections should be wrapped in boundaries anyway,

I guess this is what I was missing. I was exposing too much of the component to its children.

Are there reasonably complete examples of systems built on integrant somewhere? Something like those in system/examples? I have never used component nor mount, so I am having a bit of a struggle on the "grand-design" side of things.

Vitalie Spinu · Answer 4 · Fri May 19 2017 01:03:37 GMT+0800 (China Standard Time)

{:example/server {:port 8080} :example/watchdog #{#ig/ref :example/server}}

But in order for this to work I need to design the :example/server as mutable object such that I could restart it in-place. Is this what you meant?

James Reeves · Answer 5 · Fri May 19 2017 01:21:33 GMT+0800 (China Standard Time)

Ah, sorry, I should have explained further. Boundaries are a Duct concept, and as I've been writing a lot of Duct recently, and the new Duct alpha makes heavy use of Integrant, I forgot I was replying to an issue on the Integrant repository, and not the Duct repository.

So let me start again 😃 .

I've found it good practice to avoid tightly coupling my functional code with the code that handles I/O. Some languages, like Haskell, enforce this distinction; in Clojure we have to have a little more self-discipline.

A websocket is a little complex for an example, because we need to handle channel closing, errors, and so forth. Instead, consider a SQL connection pool, which already does all those things for us. We could interact directly with the connection:

(defn get-user [spec email]
  (jdbc/query spec ["SELECT * FROM users WHERE email = ?" email]))

(defmethod ig/init-key :database/sql [_ options]
  {:datasource (db/connect-pool options)})

But I've found it's useful to add a layer inbetween to loosen the coupling:

(defprotocol Users
  (get-user [db email]))

(defrecord DatabaseBoundary [spec]
  Users
  (get-user [_ email]
    (jdbc/query spec ["SELECT * FROM users WHERE email = ?" email])))

(defmethod ig/init-key :database/sql [_ options]
  (->DatabaseBoundary {:datasource (db/connect-pool options)}))

In this example, the DatabaseBoundary record does very little, but it still provides us with a way of mocking out the database when required. In a more complex example, we can manage connections and reconnections, and use the protocol to abstract that away.

James Reeves · Answer 6 · Fri May 19 2017 01:29:37 GMT+0800 (China Standard Time)

But in order for this to work I need to design the :example/server as mutable object such that I could restart it in-place. Is this what you meant?

The :example/watchdog key would need some internal mutation, but not necessarily the :example/server key, so long as no other keys depended on the server.

However, now that I've thought about it a little further, I don't think I'd recommend a :example/watchdog approach. In general, I think the safest option is to stop the system when something unexpected happens. Checking the system's health shouldn't be necessary if we kill it off the moment it looks sick.

For connections and so forth, we can give the component itself a connection pool, or some way of restarting the connection when it's dropped. This is a common problem, so there may be libraries out there to simplify this.

Vitalie Spinu · Answer 7 · Fri May 19 2017 02:03:21 GMT+0800 (China Standard Time)

Boundaries are a Duct concept,

Yes. I read your note on boundaries here and will try to follow the advice from now on.

Checking the system's health shouldn't be necessary if we kill it off the moment it looks sick.

I have multiple websockets open (listening for transactions) and restarting the full application on every websocket disconnect is really not an option.

This is a common problem, so there may be libraries out there to simplify this.

I am using manifold which provides on-close callback but AFAICS has no provision for reconnection. So for now I will simply keep the connection in an atom or a mutable deftype and reset it when needed.

Thank you for all the input!