Lookup faster than select for ETS
manuel-rubio opened this issue · comments
Hi, a colleague show me using a benchmark than the use of ets:select/2
is slower than ets:lookup/2
:
Quick benchmark:
Name Iters Op/s Min P50 Avg P90 P95 Max
ets lookup 350000 4525963 201.6 ns 207.8 ns 220.9 ns 224.1 ns 233.9 ns 1.298 us
ets select 80000 999365 917.3 ns 977.0 ns 1.001 us 996.8 ns 1.032 us 5.244 us
In the implementation of the registry, even when it is not needed, you use it:
Line 570 in 5c7aeaf
What's the advantage? I mean, I can see here:
find_registry_entry_by_name(Name) ->
case ets:select(syn_registry_by_name, [{
{{Name, '$2'}, '$3', '_', '_', '_'},
[],
['$_']
}]) of
[RegistryTuple] -> RegistryTuple;
_ -> undefined
end.
You are using {Name, Pid}
as the key and in the other ETS table syn_registry_by_pid
, you are using {Pid, Name}
, what is the improvement instead of using as two elements in the beginning of the tuple?
Hi Manuel,
Regarding the benchmark, please note that syn uses ordered_set
tables, which have their own rules in terms of performance. I'm not saying those results are invalid, just that syn uses a specific ETS set and it could be interesting to apply them to syn specifics rather that doing generic considerations. If you end up working on some performance improvements I'll be happy to take a look!
If you're interested, the whole logic of the choice is described in this thread on EQ and it includes explanations from Sverker:
http://erlang.org/pipermail/erlang-questions/2019-December/098868.html
You're right in the case of groups, I mean, reading the email it's clear the tuple created with {Element, Groups}
and {Group, Elements}
performs a N:N relationship where a group could belongs to different elements and an elements to different groups... but I was pointing other case completely different where we are using {Name, Pid}
and {Pid, Name}
where the cardinality is 1:1 and instead of storing in {{Name, Pid}, _, _, _}
way, it could be stored in a {Name, Pid, _, _, _}
way, taking advantage of the use of ets:lookup/2
which is faster than ets:select/2
.
Sorry for the late reply Manuel. A process can be registered under multiple names, hence the need for the tuple key {Pid, Name}
. For simplicity / symmetry reasons, I've also kept {Name, Pid}
in the related table.
Thank you for this input, Syn v3 uses this suggested optimization where possible.
https://hex.pm/packages/syn