ostinelli / syn

A scalable global Process Registry and Process Group manager for Erlang and Elixir.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add :local / :global semantics

meyercm opened this issue · comments

I recognize that being able to share group names / process keys across nodes is the primary purpose; however, there are several cases where I'd like to be able to restrict the sharing to the local node, yet retain the API and functionality of :syn.

Specifically, I'm looking at a homogenous codebase that is started in N instances; for each of them I'd like to be able to create multiple groups of the form {local_events, EventType}. Using the current API for :syn, I'd need to add a Node key to the tuple, e.g. {local_events, EventType, node()} to ensure uniqueness across nodes.

The same general usage applies to Keys: I need an {event_coordinator, EventType} on each node, yet unless it is tagged with the node as well, there will be conflicts.

I am happy using {..., node()} as a work around, but this would be a useful addition to the API in my view.

Thank you for the feedback @meyercm.

As far as I understand your need, wouldn't being able to retrieve only the local pids of a group be enough for you? Something along the lines of:

syn:get_local_members(Name)

And then publishing to only local members:

syn:publish_to_local(Name, Message)

That would be excellent for groups.

How would you handle local registration? The sticking point seems to be when a process is registered locally with a key, and a different process is registered globally with the same key.

Splitting based on function name could work:

syn:register_local(Key, Pid)
syn:find_by_key_local(Key)

or via arity;

syn:register(Key, Pid, local)
syn:find_by_key(Key, local)

I'm not really keen on adding these functionalities, because there are already a variety of alternatives.

For instance, what is wrong with register/2 itself for your intents?

BIF register/2 requires an atom for the name; syn:register allows arbitrary registration names. I could do dynamic atom generation, but that smells funny, aside from being mildly dangerous.

I was previously using :gproc for this functionality, but my new project is moving into a multi-node setup, where performant solutions are not in abundance (as I saw in your blog post). I could use :gproc locally, and :syn globally, but having two APIs for essentially the same functionality is suboptimal.

For now, I'll just add node() to the end of my local tuples- if you do add syntax supporting local operations, I would surely use it. Thanks for building such an awesome library.

Understood, this makes sense.

Thank you for your feedback, will evaluate and see if others chime in too.

Yes please! :)

The current implementation of the {via, syn, <<"your process name">>} tuple does not allow the developer to specify whether the gen_server process should be global or local, since everything in Syn is global by default (at least, currently).

If we add the global/local as requested here, we also need to specify whether a process is local or global.

Maybe this requires some API semantics to combine with what requested in #22? Not sure.

cc/ @ephe-meral

Here are two implementation possibilities.

1. local_* prefix

We add all the existing functions with the local_* prefix, such as, for registry functions:

syn:local_register(Key, Pid)
syn:local_find_by_key(Key)
syn:local_find_by_pid(Pid)
[...]

and for process groups functions:

syn:local_join(Name, Pid)
syn:local_get_members(Name)
syn:local_member(Pid, Name)
[...]

2. local diversifier

We could also use the same names but have the local atom diversifier as first argument:

syn:register(local, Key, Pid)
syn:find_by_key(local, Key)
syn:find_by_pid(local, Pid)
[...]

and

syn:join(local, Name, Pid)
syn:get_members(local, Name)
syn:member(local, Pid, Name)
[...]

Any preferences / arguments for one vs the other? I tend to prefer 1 which is closer to Erlang's semantics (see for instance pg2:get_local_members/1) and provides a clean naming separation, but I'm open for discussion.

Also, for the via tuple registration of gen_servers, this would register the gen_server globally (as it is now):

LocalTuple = {via, syn, <<"your process name">>}.
gen_server:start_link(Tuple, your_module, []).

And if you wanted the gen_server to be registered locally:

LocalTuple = {via, syn, {local, <<"your process name">>}}.
gen_server:start_link(Tuple, your_module, []).

Opinions?

@ostinelli I like nr. 1 and also the tuple solution, for the reason that it doesn't break the signature of the existing functions, but rather adds new (special case, so to say) functionality... Though I wonder, is it necessary to specify it for all these functions explicitly? Because maybe its enough to have it only explicitly when you register a name or join a group - which would make this a little less of a whole new branch of specialized code.
With nr. 2 we'd have the problem that there's a new parameter that needs to be documented and explained, although it will only ever be local or global - even if we hide that behind a default implementation with the 2 standard arguments and that 1st param set to global...

Btw, another option for the gen_server stuff would be to have a separate module for it (although that's maybe not the cleanest solution) - like e.g. syn_local instead of syn.

Though I wonder, is it necessary to specify it for all these functions explicitly? Because maybe its enough to have it only explicitly when you register a name or join a group - which would make this a little less of a whole new branch of specialized code.

Ok, let me give you some examples of what I'm thinking. Let's follow what you suggest and specify whether a process is global or local only when you register. If so, on node1:

node1> ok = syn:local_register(my_proc, self()),
node1> syn:register(my_proc, self()).

What should the second expression do?
Return an error, since my_proc is already registered locally?

What if we tried to register the same my_proc key but on a different node2, and globally:

node2> syn:register(my_proc, self()).

What should this expression do? Could we return an error there? Wouldn't blocking a global name because the same key is registered locally on another node (node1) be against the very concept of it being local?

Allowing global to be a completely separate space of local would solve this. This is what @meyercm is suggesting here above. For example, we could have:

node1> ok = syn:local_register(my_proc, Pid1),
node1> ok = syn:register(my_proc, Pid3).
node2> ok = syn:local_register(my_proc, Pid2),

In this scenario, my_proc would be resolved as Pid1 on node1 and Pid2 on node2 for local queries, but the same Pid3 for all nodes in global queries.

In this case, you'd therefore have to specify if you're looking for a local or global value, hence the need of having all the additional functions as well.

Does it make sense? Would this be confusing?

Yep, that's a good point indeed.

Thinking about this, maybe it would be another option to encourage the use of syn for global process name registration, and the use for gproc etc. for local name registration? That way we'd have a better separation of concerns and wouldn't need to re-define the wheel...

@meyercm has raised a clear use case, will think about it a little more.

I would like to have this, I want to use syn as process registry and for the pub/sub on top of riak_core, so riak_core will deliver the command to the physical node, so I don't need global processes.

Hope it adds something to the conversation :)

Considering the way syn has been designed so far, I'd say don't.

syn:get_local_members(Name) and friends would already be plenty enough.

If you have a scenario where you need to register one process per node with the same name, you can easily do it with a group and get_local_members. I'd suggest adding that and seeing what happens. It can be useful for this use case and more.

Proper local/global separation would require a lot more thoughts and possibly breaking changes here and there.

Thank you for these additional inputs.

@marianoguerra I understand that you currently do not use Syn then, since Syn's main purpose is to allow for global name registry and process groups.

@essen are you suggesting to only implement a syn:get_local_members/1 for groups only? This doesn't cover the original purpose of this issue though (which is to avoid using two libraries for registration purposes).

Another possibility (just putting it out here) is to have a syn_local module which mimics everything of the syn module, except that it covers only local registration and is not replicated across the cluster.

So, for instance: syn_local:register(Key, Pid).

I played with it in a prototype because I like the API and covers what I need, but in the riak_core case it wouldn't fit.

regarding a different lib, if it's easier for you to split in syn, syn_local both reusing code from a third module synlib or similar it would be ok for me.

@ostinelli Yes, just add functions to groups to get/publish to local members and from that you can build a lot of great things without much effort.

@essen though this doesn't disable the distribution of the local group members, which I guess impacts the distributed system as a whole. (Thinking about scaling - sending possibly unnecessary messages doesn't sound so good)

Depends on your use case though. I can see many cases where I want both the pids on all nodes, or the pids only on the local node (obvious example would be sending data everywhere but reading locally).

What are the use cases for local-only, and would they be resulting in a lot of messages being sent? Or would that be negligible for most cases? There's always gproc if you really need it.

Re-reading all of this issue after a while, it seems to me that it is diverting from the original syn scope too much, which is to be global. I will therefore add, as discussed:

syn:get_local_members(Name)

And then publishing to only local members:

syn:publish_to_local(Name, Message)

This should cover most of the cases. I agree, @ephe-meral, that this might generate some unnecessary noise, but having seen syn resolve conflicts on clusters with millions of registered processes I am not expecting this to be too much of an issue.

This would solve @meyercm and @marianoguerra issues, without removing the real nature of syn.

Any inputs welcome before I dig in.

I am looking for pg2 replacements, and syn's lack of get_local_members/1 and which_groups/0 (and perhaps get_closest_pid/1 although I could implement myself on top of get_local_members/1 and the existing API) is all that is holding me back from trying syn. So +1 for adding some simple read-only local API.

@dcsommer thanks for the feedback.

Added syn:get_local_members/1,2 and syn:publish_to_local/2 functions in Syn 1.6.0, closing.

Thank you all for inputs.