tj / axon

message-oriented socket library for node.js heavily inspired by zeromq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Akka Style Network

groundwater opened this issue · comments

Hi guys, I love axon. It does a great job.

I do however think it's too low-level for a lot of services, and am thinking of building an Akka-style network on top of axon.

My questions to you are:

  1. Have you already done this, and I missed it?
  2. Are you interested in working together?
  3. Do you think it's a stupid idea?

I would like path-based messaging such as //america/white-house/president#hello
where any node can be messaged once you're connected to the network.

It's easy to dream up all sorts of fancy things the network could do, but I would prefer to focus on small set of message-scenarios, and focus first on robustness / fault-tolerance.

Sorry if this isn't the best place for this message, we can migrate the issue somewhere else if necessary.

I'm not familiar with Akka but we have some (internal) high level infrastructure built on axon, nothing I'm happy enough with to OSS yet. I agree that it's pretty much required to have something high level, in our case mostly just to keep all that glue code out of our app itself, there's quite a few things I'd love to have but we've only implemented some of them so far

My current goals/requirements for an alpha would be:

Must Haves:

  1. Path-Based Messaging — Message a path, and whomever is registered to the path receives it.
  2. Request-Reply — While other message-styles are important, I would be happy to start with this.

Nice to Haves:

  1. Single Connection URL — Each node need only connect to a globally unique network name. This can be the URL of a central server, or some more abstract concept like an Active Directory Domain.
  2. No Quiet Failures — If a message cannot be delivered/queued the sender will know.
  3. Fault-Tolerance — temporary disconnects should not kill the network.

Does this match close to your internal project?

somewhat yup, so far we have (not 100% robust yet though):

  • cross-node tracing and graphing
  • service registry with semver node requests
  • "cloud" canvas graphing of the network topology, showing with nodes are talking to which and their mem usage etc

so far our data is pretty transient, it's not a huge deal if we drop a few messages so we have no queue offloading yet. our service node would be more like the path-based thing you're talking about, you can have N of a given node that other nodes can request for use, if a node or the registry dies they all hook back up autonomously. Some things we dont have right now includes redundancy for the registry, possible a nicer API with a bit more discovery, robustness in general is not up to par yet.

we're also managing nodes with mongroup/mon right now, which works fine but ideally it would be awesome if we could run most things in a single process during development actor-style and describe the topology in JSON and have it do all that for us in production. With a VM it's not a huge deal to have all these nodes but it could probably be more elegant

here's the graphing stuff in dev, it would be nice to redo this with html instead of canvas and have some more interactive stuff for logging, clustering etc but that's of course slightly overkill :D haha
Screen Shot 2013-01-23 at 3 32 42 PM

Wow, talk about flair! I had nothing this fancy in mind, but it's very cool.

I like the idea of describing the physical topology externally, I would rather the developer not care about where/how nodes are setup and running.

I think obviously a name-server/registry is necessary. I envision an API like the following:

  1. connect to registry using a single URL
  2. nodes register event handlers for incoming messages
  3. nodes can emit messages to any name/path on the network

The registry does the following:

  1. routes message between nodes
  2. names nodes based on node instance data sent during connection, i.e. node-type, hostname, ip, etc
  3. operates a persistent queue for robustness and fault-tolerance

There are many more things I would like to add, but i think the above represents a minimal functional concept that is both useful and feasible.

our registry thing (we're just calling a switch node) pretty much just tosses around the info for available services and their associated metadata, no routing of messages (currently), so it's not a bottleneck at all but partition tolerance would be nice.

this is what the node registration looks like, address and pub are just arbitrary junk so that the requester can connect (ideally this is a bit more automated):

  node.register('thumbs', pkg.version, {
    address: 'http://' + server.addr.address + ':' + server.addr.port,
    pub: addr.string
  });

and the node requesting looks like:

var sock = axon.socket('sub');

node.find('thumbs', '*');

node.on('connect thumbs', function(node){
  debug('reconnect thumbs to %j', node.meta);
  sock.close();
  sock.connect(node.meta.pub);
});

not much but still a bit of manual stuff in there, it's pretty abstract right now though, the remote service could be http, axon, zmq, anything, that limits the automation though

I still haven't used much Go but I was curious how they usually stitch this sort of thing together. The goroutines abstraction seems like a great fit per-machine since you can just scale up the thread usage, but then that just breaks down at the network again anyway, something similar but more abstract would be nice, it'll look lame with node though, callbacks all over

Looking at what you have there, it actually looks more like what I would call a service discovery protocol.

I was also looking at building a tool like this. I am a fan of the 12 factor app and pass in my external dependencies via environment variables. I use node-foreman locally, but in production I would love a tool that discovered external services automatically, then kickstarted my app with the proper environment.

Anyways, regarding the message network, I only imagine a central router initially. As the project evolves, I would like to build in support for multiple redundant routers, or even direct node-to-node routes. Whatever happens it would be great if the code using the network was unaware of the topology.

I have tried Go, I like it. I wouldn't do this project in Go, but I think it's a great language for network libraries and apps.

I hate callbacks. I have also been tinkering with a library that implements the Monadic typeclasses from scala. Been hacking away at the coffee-script compiler to understand the following do-comprehension:

do
  name    <- some_async_op()
  account <- another_async_op( name )
  status  <- something_that_might_return_null( account )
  return console.log status

\end{ Tangent }

read some of the akka stuff, definitely similar to what I wanted, more or less just erlang without erlang haha. node's shitty isolation kinda hurts for this stuff but whatever we're really invested in it for now

Akka is cool, for sure.

I'm gonna write up a short design doc later, I'll link it here when I'm done.

I strongly agree that the topology should be independent of the code itself, we'd like to get away from that as well

What are your feelings on Coffeescript?

Anyways, here is my initial draft called Federation

I would appreciate your thoughts on the draft.

Awesome stuff. Love the graph of the peers @visionmedia.

@jacobgroundwater There is already request/reply support, or are you thinking something different? Also, there used to be a router for "identity" based messaging but it was awkward. Check out this issue (see the last comment from me) for what I have running (which still is not great).

@gjohnson looks you've made a good run at this problem as well.

I suppose I am approaching this problem differently. Rather than build a router into an Axon socket, I am using Axon as the underlying transport of the message network. Likely the packages that are submitted to the socket will be enveloped inside of a Federation Packet of sorts.

I am modelling the network off of Akka, which has no direct concept of request/reply, only asynchronous messages. They use temporary actors to achieve a request/reply model, which I am calling anonymous handlers. I chose the Emitter socket type because I only wanted a single socket open, and it handles wire-serialization.

In the end, I think it comes down to programming style. I use a lot of indirection; I like to layer my projects.
Axon is the foundation, Federation uses a subset of Axon to abstract away inter-host communication. I am not trying to handle every use-case yet; I would prefer to make a small, stable module then expand.

Right now I am just writing my design document. I am trying to narrow the functionality down to the minimum necessary features for an alpha. Again, want to use Axon in much the same way that a TCP/IP network stack is layered.

I am not sure if this addresses your needs, but it would be awesome if more people got involved.

you dont want to know my thoughts on coffeescript :p haha

Ha, I think you just told me.

I have been playing with this for a bit and have a few thoughts.

  1. Akka/Erlang use a supervisor model, where actors are responsible for booting and maintaining child actors. I am going to avoid this model for start.
  2. I want to be able to have multiple actors per process, the Axon socket is just a transport protocol used when crossing the network.
  3. I think using real routes is the way to go at first, i.e. //192.168.0.21/actor#mailbox. Location-agnostic names can be added later in the form of an alias.

I think I should separate out network transport as its own problem, but I had a few thoughts on that as well. There are two ways to network the hosts, with a star topology, or peer-to-peer with a central name server.

I think Axon is perfectly designed for the peer-to-peer approach, but actually fails with a centralized host. Axon sockets are send or receive oriented, when a start topology requires fully bi-directional sockets. A bi-directional solution would require each host opening a socket to listen on, but at that point you might as well go peer-to-peer.

Peer-to-peer is generally avoided whenever firewalls and proxies are in play, but I think the application of an actor system will generally occur in a VPC with full network access control.

Since the transport problem is actually a different problem to messaging, it makes sense to decouple the two interfaces.

" fails with a centralized host" not sure I follow there, but that's also not such a bad thing haha. For me at least I think ultimately axon shouldn't even be noticeable at whatever this higher abstraction is. Abstracting away what process/thread/machine the thing lives on is pretty nice from a dev stand-point and flexible in production. I haven't deployed anything in erlang so I'm not overly familiar with what their processes normally are but I definitely think there's quite a bit we could do.

As for bi-directional stuff we may be able to improve that, if either end goes down we already have most of the logic in place to re-establish the connections so it should be pretty trivial

fails with a centralized host

Sorry I should have clarified. When a centralize host is used, only the server needs to listen for incoming connections. Clients need only connect, then because a TCP socket is bi-directional messages can be send and received on the same connection. Axon sockets would require both the client and server listening for incoming connections.

However as you have pointed out, the underlying transport should be abstracted away entirely.

As for bi-directional stuff we may be able to improve that, if either end goes down we already have most of the logic in place to re-establish the connections so it should be pretty trivial

I agree, the underlying transport should handle re-connections, however I would like to make sure that undelivered/failed messages do not silently disappear. In distributed systems failure happens, but not knowing about failure can lead to inconsistencies.

I re-wrote my requirements doc to reflect the changes.
The code in the project does not reflect these changes yet. I think I need a better way to organize this.

yeah i'll see if we can expose some of the lower level messaging stuff so that it could be used in a more traditional server manner. I found the lack of anonymous clients a little strange with zmq at first but after a while the number of use-cases really go down since you have full control over the infrastructure, in our cases at least I haven't had much use for that

I think starting a listening port on each host is acceptable. That way peers can talk directly. In fact, at first I think messages destinations should just be URLs. e.g.

node1.send('axon://10.0.1.12/node2', 'hello')

The supervisor or whatever is responsible for node1 will initiate a connection to 10.0.1.12 and send the message. The supervisor on host 10.0.1.12 will multiplex the incoming messages across available nodes (or actors).

A supervisor could accept multiple protocols based on the URL. So there would be an "AxonProtocol" class which is stored in the supervisor via a hash-map, e.g. { "axon": new AxonProtocol() }. I like this approach because it lets us tweak the protocol without necessarily affecting the message network.

Here's a diagram of the setup:

Network Diagram

The Switch is responsible for multiplexing incoming messages between nodes, and passing outgoing messages to the appropriate transport. Switches delegate to transport objects like Axon or HTTP to pass messages across the network.

+1 I would love to see a nice implementation of the actor model in node. It seems like you guys have just been discussing server side actors, but actors could run in the web browser as well. It would be really cool if a web app was simply composed of a set of actors and configuration data specifying where the actors should run.

@joshrtay that is an interesting idea. My intent is to build something where layering such behaviour in would be easy, but I would probably not work it into the first release.

I want to make actors cheap so one could write a proxy library that maintains a local actor which proxies requests on behalf of the web browser. The end result would be as you described, but it would keep things simple.

Okay, I have a working, albeit very minimal library working. I have updated the Federation README and the overall design is as follows:

design

There is a Transport object that acts as the interface between the actor network and whatever transport protocol you need on the wire. I have only included a protocol-less loopback transport as of now.

Check out my working simple example.

Still lots to be done, feedback/comments/pull-requests welcome.

Updates

Federation README has been updated to reflect working changes. There are working examples in the examples directory.

Overview

  • I have a tell and ask pattern working. Tell is fire-and-forget, where as ask expects a reply.
  • Actors message other actors by name.
  • Names can be anything.
  • A routes.json file can be used to map names to external hosts, or other processes.
  • Matches are done via regular expressions, on a first-to-match basis

The system works peer-to-peer, but the changes to allow message-forwarding are minimal.

Right now, it would be great to get some early feedback. Let me know what you think of the idea, and if you have time run the examples.

Thanks!