Investigate ways to reduce the analysis message matcher overhead

Question

Investigate ways to reduce the analysis message matcher overhead

trink opened this issue 7 years ago · comments

Mike Trinkala commented 7 years ago

Some Approaches to Test

In-line all the matchers before performing any analysis
- result: added complexity and required lua_sandbox API changes without demonstrating general benefit to most of the current use cases.
Hash router, Analyze the matchers and create a hash table lookup for all matchers keying off a particular header/field e..g Logger ==
Tree router, Analyze the matchers and create a hierarchy of matchers so entire groups of matchers can be eliminated by a single match

Mike Trinkala · Answer 1 · Thu Dec 07 2017 00:15:50 GMT+0800 (China Standard Time)

This is a work in progress, experimentation will continue as the schedule allows.

Charlie Vuillemez · Answer 2 · Wed May 02 2018 23:19:03 GMT+0800 (China Standard Time)

+1 :)
I have very long message matchers which are bottlenecks.

Mike Trinkala · Answer 3 · Thu May 03 2018 03:21:24 GMT+0800 (China Standard Time)

There may be some things you can do to optimize specific matchers (order of expressions and types of comparisons). If you can share some problem matchers I will take a look.

The goal or the remaining items above is to handle many matchers faster (by clustering) so the large matchers would have to share some conditional expressions that could fail the entire set fast (i.e. if they are relatively unique this experimentation will not help).

Charlie Vuillemez · Answer 4 · Thu May 03 2018 16:22:36 GMT+0800 (China Standard Time)

I understand this optimization only apply to analysis plugin ? (with all thread sharing the same message_matcher).
In my case I have an output plugin with a long message matcher string:
message_matcher = "(Type =~ '/AAAAA$' || Type =~ '/BBBBB$' || [ ... ] )"
For now I solved the bottleneck by splitting it into multiple instance plugins.

Mike Trinkala · Answer 5 · Thu May 03 2018 21:40:37 GMT+0800 (China Standard Time)

Yeah mozilla-services/lua_sandbox#213 is about all I can squeeze out of a single matcher.

mozilla-services/lua_sandbox#208 may be relevant if the string at the end is unique so you don't need to actually anchor it. Type =~ '/unique' is multiple times faster than Type =~ '/unique$'

Charlie Vuillemez · Answer 6 · Fri May 04 2018 22:17:53 GMT+0800 (China Standard Time)

Yeah I tested and it's faster without the trailing "$" .
That's amazing, usually we could think the "$" is faster cause not all string must be parsed, just the end !