ostinelli / syn

A scalable global Process Registry and Process Group manager for Erlang and Elixir.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lookup faster than select for ETS

manuel-rubio opened this issue · comments

Hi, a colleague show me using a benchmark than the use of ets:select/2 is slower than ets:lookup/2:

Quick benchmark:
                Name        Iters         Op/s          Min          P50          Avg          P90          P95          Max
          ets lookup       350000      4525963     201.6 ns     207.8 ns     220.9 ns     224.1 ns     233.9 ns     1.298 us
          ets select        80000       999365     917.3 ns     977.0 ns     1.001 us     996.8 ns     1.032 us     5.244 us

In the implementation of the registry, even when it is not needed, you use it:

case ets:select(syn_registry_by_name, [{

What's the advantage? I mean, I can see here:

find_registry_entry_by_name(Name) ->
    case ets:select(syn_registry_by_name, [{
        {{Name, '$2'}, '$3', '_', '_', '_'},
        [],
        ['$_']
    }]) of
        [RegistryTuple] -> RegistryTuple;
        _ -> undefined
    end.

You are using {Name, Pid} as the key and in the other ETS table syn_registry_by_pid, you are using {Pid, Name}, what is the improvement instead of using as two elements in the beginning of the tuple?

Hi Manuel,
Regarding the benchmark, please note that syn uses ordered_set tables, which have their own rules in terms of performance. I'm not saying those results are invalid, just that syn uses a specific ETS set and it could be interesting to apply them to syn specifics rather that doing generic considerations. If you end up working on some performance improvements I'll be happy to take a look!

If you're interested, the whole logic of the choice is described in this thread on EQ and it includes explanations from Sverker:
http://erlang.org/pipermail/erlang-questions/2019-December/098868.html

You're right in the case of groups, I mean, reading the email it's clear the tuple created with {Element, Groups} and {Group, Elements} performs a N:N relationship where a group could belongs to different elements and an elements to different groups... but I was pointing other case completely different where we are using {Name, Pid} and {Pid, Name} where the cardinality is 1:1 and instead of storing in {{Name, Pid}, _, _, _} way, it could be stored in a {Name, Pid, _, _, _} way, taking advantage of the use of ets:lookup/2 which is faster than ets:select/2.

Sorry for the late reply Manuel. A process can be registered under multiple names, hence the need for the tuple key {Pid, Name}. For simplicity / symmetry reasons, I've also kept {Name, Pid} in the related table.

Thank you for this input, Syn v3 uses this suggested optimization where possible.
https://hex.pm/packages/syn