timbray / quamina

Home of Quamina, a fast pattern-matching library in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

matchSet.addX: hotspot

yosiat opened this issue · comments

Hi,

I am doing some performance tests to see If I can use this library instead something I built.

On most of my benchmarks Quamina is faster, there are two main spots which caused terrible degradation:

  • Flattening - I am passing large objects (7kb~ in size, with nesting) and flattening kills the performance, but since I know the list of fields (& paths) being searched on, I can reduce the incoming event to a smaller one probably.
  • matchSet.addX - currently it's duplicating the matchSet (even if there are no new matches), this is causing a major degradation.

I did change locally of addX to update in place instead of creating new matchSet, and this is the perf difference I see:

Before:

Benchmark_Quamina-10               27054             49177 ns/op           43311 B/op        160 allocs/op

After:

Benchmark_Quamina-10              221080              6717 ns/op            8932 B/op         28 allocs/op

From what I see, it's done for concurrency concern, but looking at -

func (m *coreMatcher) matchesForSortedFields(fields []Field) *matchSet {
it looks like there is no concurrent access.

And maybe simple RWMutex can help us instead of duplication?

I can submit a PR to fix this, but trying to understand what are the option we have in hand.

First of all, on the flattening, I suspect it's going to be hard to improve things very much. The code in flatten_json that skips over unwanted fields is pretty fast so it may not be cost-effective to write code to remove fields from events before testing them.

I think the best approach to speeding up flattening would be to use protobufs or Avro or something where there are pointers to the fields so you don't have to scan over all the bytes that aren't interesting. Several people have told me that they think in principle it should be straightforward to write such a hyper-efficient flattener, but AFAIK nobody has done it yet.

Oh, and I should say, thanks for the input!

On the concurrency stuff… that's surprising. Could I ask a favor? Go into concurrency_test.go and try increasing the intensity by increasing the "n" and "tasks" variables, with your update-in-place change?

The concurrency we're worried about is between AddPattern() and MatchesForEvent(). There can only be one AddPattern thread but there can be many MatchesForEvent. First symptom was one of the MatchesForEvent coroutines reading a map[] while an AddPattern goroutine was updating it, the runtime throws a specific panic. Second symptom: Same, only for a slice, except this eventually causes an invalid-index panic. The second symptom was much rarer and harder to reproduce.

The update-by-replace in matchSet and other places fixed the first problem, and then the atomic.Value in coreMatcher, fieldMatcher, and valueMatcher fixed the second. Possibly the atomic.Value also removed the need for the fancy matchSet? That doesn't seem true in my mind, but I could easily be wrong.

I wonder if the case where matchSet.addX() is adding no new values is easy to detect and simply return the input? I'll have a look at that.

BUT now it dawns on me that the matchSet that is used only in the AddPattern thread, which is the variable matches in matchesForSortedFields() is accessed only in that thread and I'm sure it could safely use update-in-place. If you wanted to create a new type for that purpose, I'm pretty sure I'd take that PR as a first step.

Hi,

Thanks for the responses! I will give them a detailed look & answers tomorrow (or the day after).

I wonder if the case where matchSet.addX() is adding no new values is easy to detect and simply return the input? I'll have a look at that.

From my data, before:

Quamina-10               26460             47247 ns/op           43318 B/op        160 allocs/op

After adding len check on exes in addX:

Quamina-10               35757             33191 ns/op           30393 B/op        129 allocs/op

It gives a good improvement, but it's not the one I need in order to replace my existing implementation.

If you wanted to create a new type for that purpose, I'm pretty sure I'd take that PR as a first step.

You are suggesting a new type which the same as matchSet but does update-in-place? it sounds ok, but what do you think about two methods / one method ?

// one method option
addX(updateInPlace bool, exes ...X) *matchSet

// two methods option
addX(exes ...X) *matchSet
unsafeAddX(exes ...X) *matchSet // does update-in-place

First of all, on the flattening, I suspect it's going to be hard to improve things very much. The code in flatten_json that skips over unwanted fields is pretty fast so it may not be cost-effective to write code to remove fields from events before testing them.

About flatenning, I admit I haven't looked too much at the code what it returns and how does it works, but in my solution I have am getting those large events and I managing to work with them really efficiently.

I achieve that by using jsoniter / jx which allows me to read subset of the event, and instead of flattening I am doing lazy-reads against that JSON.

What helped me a lot here, is to re-strucutre my object, so instead of:

{
    "field1": "a", // have match against
    "field2": "b", // have match against
    ... rest of the 7kb data ..
}

I am doing:

{
     "context": { "field1": "a", "field2": "b" },
     "payload": { ... 7kb data .. }
}

With this approach jsoniter/jx don't need to traverse all of the properties, just get to context property and that's it.

I'll give flattening a deeper look and see why this re-structuring don't help Quamina, since essentially the flattener should skip the whole "payload" object.

Again, thanks for your feedback and for this awesome library!

Haven't seen your code but I assume you mean something like

  if len(exes) == 0 {
    return m
  }

then yes, please include that in any PR.

Yes, two methods on the existing matchSet class is probably better. Maybe addXSingleThreaded() or some such?

Don't know jsoniter or jx but I'm super-interested in anything that could make flattening faster because last time I profiled it was still >50% of matching latency. So I look forward to hearing what you discover.

Submitted PR for matchSet changes - #109, by the way I see there are no formatting enforced with gofmt - would you accept a PR for doing gofmt on the files and ensuring that on CI?

Regarding the flattener, I have looked into it, and given this pattern:

{
	"context": {
		"field1": ["a"],
		"field2": ["b"]
	}
}

Event:

{
     "context": { "field1": "a", "field2": "b" },
     "payload": { ... 7kb data .. }
}

It looks like the flattener goes into "payload" and lookup for properties called "context" / "field1" / "field2", which makes it not very efficient for my case (and maybe others as well), instead what I suggest is to:

  1. Keep track on "paths" for patterns - in my case it's context.field1 / context.field2
  2. In flattening phase, pluck only those paths and ignore all others.

Good catch on gofmt. I have my IDE wired up for gofmt-on-save and my bad for not noticing the miss. Yess, I'd love a CI PR for that. But I thought we already had something - @embano1 am I wrong on that?

Yeah, at the current time the flattener traverses the event end-to-end to be sure it hasn't missed any relevant fields. Which gives me an idea: enhance the NameTracker interface to give a list of all the fields in play and then the Flattener could know when it's found everything and stop looking. I'd be a little worried that for short simple events this could actually slow things down, so it might make sense to have a heuristic and only do this check for events larger than a certain threshold.

Good catch on gofmt. I have my IDE wired up for gofmt-on-save and my bad for not noticing the miss. Yess, I'd love a CI PR for that. But I thought we already had something - @embano1 am I wrong on that?

It's because the linter wasn't enabled in golangci, I have enabled it here - #110 and fixed all linting errors.

enhance the NameTracker interface to give a list of all the fields in play and then the Flattener could know when it's found everything and stop looking.

This exactly what I have in mind, how do you think to implement such an API? it means we need to store list of pattern fields in a coreMatcher and pass them via NameTracker interface. The problems come in hand is how to handle deletions, because then we need some kind of ref-count which makes thing complex.

I'll try to implement a POC of external flattener which accepts a list of paths and will do flattening using jx / jsoniter I'll check small objects (where patterns covers 90% of properties) and large objects and will report back with code-sample and my findings.

Some quick update..

I wrote another flattener based on Jx, which accepts "paths" (tree based paths needed pluck) -

goos: darwin
goarch: arm64
pkg: github.com/yosiat/quamina-flatenner
Benchmark_Quamina_JxFlattener-10                  143412              7351 ns/op              24 B/op          3 allocs/op
Benchmark_Quamina_Flattener-10                     83698             14388 ns/op             360 B/op         23 allocs/op
Benchmark_LargePayload_QuaminaFlattner-10           77971             15326 ns/op            1240 B/op         44 allocs/op
Benchmark_LargePayload_JxFlattner-10               145306              8318 ns/op             904 B/op         24 allocs/op
PASS
ok      github.com/yosiat/quamina-flatenner     5.347s

Tomorrow I'll run Quamina flattener tests against it and make sure it's compliant and then I'll push the code to github (to a repo) so you can give a look at it.

Update

It can be improved further -

goos: darwin
goarch: arm64
pkg: github.com/yosiat/quamina-flatenner
Benchmark_Quamina_JxFlattener-10                 2612365               437.9 ns/op            24 B/op          3 allocs/op
Benchmark_Quamina_Flattener-10                     83246             14347 ns/op             360 B/op         23 allocs/op
Benchmark_LargePayload_QuaminaFlattner-10           78272             15307 ns/op            1240 B/op         44 allocs/op
Benchmark_LargePayload_JxFlattner-10               876013              1365 ns/op             904 B/op         24 allocs/op
PASS
ok      github.com/yosiat/quamina-flatenner     5.727s

What is the first column in the benchmark output? Ideally we'd like to run some of the existing Citylots benchmarks replacing json_flattener with yours and compare the performance.

Just one caveat: I am very very reluctant to add dependencies to Quamina, because I see this as a very low-level and horizontal library. I have spent years fighting through dependency-management hell and seen enough horrible security disasters that I am personally reluctant to adopt libraries that have much in the way of dependencies.

What is the first column in the benchmark output?

It's the standard output of go benchmarks, it's the number of loops.

Ideally we'd like to run some of the existing Citylots benchmarks replacing json_flattener with yours and compare the performance.

Where they exists? I can run them and publish results here until I publish the full code.

Just one caveat: I am very very reluctant to add dependencies to Quamina, because I see this as a very low-level and horizontal library.

Totally makes sense and acceptable, I used Jx since it was faster and easier to get to a result which shows my point, I assume the changes I did can be done with encoding/json, it was too complex for me to change existing code to get to my result.

Once I publish my code, I believe you will understand easily what I did and how I improved the performance and what you can do in order to adjust existing code for it.

Have a look at benchmarks_test.go. Unfortunately I previously didn't know about the built-in Go benchmarking support so they don't yet take advantage of that.

Cool, I looked at it and I'll need to do some adjustments to my code to make it work.

Going to sleep, will do it tomorrow ~

This exactly what I have in mind, how do you think to implement such an API? it means we need to store list of pattern fields in a coreMatcher and pass them via NameTracker interface. The problems come in hand is how to handle deletions, because then we need some kind of ref-count which makes thing complex.

The first/easiest thing I thought of would be to have an API in NameTracker like

func (nt *NameTracker) GetFieldSet() map[string]bool 

The idea is it would return a "set" of all the field names that are used and, for convenience, the set would be writeable, so whenever you encounter one in the element you remove it from the set and as soon as the set is empty you stop parsing the event. It would be a bit expensive to generate but you're only going to be using this API on big events, so maybe OK? But I didn't think a lot, quite likely you have a better idea.

BTW, I think your technique of moving the interesting fields up to the front of the event is very clever, but probably not a reasonable thing to ask users to do in the general case. But this new API might be a good idea anyhow.

Hi,

Ran the benchmarks_test.go I saw only one benchmark for BigShellStyle -

# Quamina
Field matchers: 27 (avg size 1.000, max 1), Value matchers: 1, SmallTables 54 (avg size 15.500, max 28), singletons 0
428,547.72 matches/second with letter patterns

# Jx
Field matchers: 27 (avg size 1.000, max 1), Value matchers: 1, SmallTables 54 (avg size 15.500, max 28), singletons 0
1,367,947.02 matches/second with letter patterns

I have published the code up here - https://github.com/yosiat/quamina-flatenner
I made some hacks in Quamina to expose some internal methods in order to run the benchmarks externally, so it will be be a bit hacky to run it locally, but what's needed is:

  1. Clone the quamina-flatenner project
  2. Install https://github.com/rogpeppe/gohack
  3. Run go build and then gohack get -vcs -f github.com/timbray/quamina
  4. In the gohack folder, checkout this branch - https://github.com/yosiat/quamina/tree/flat-hack, flat-hack.
  5. And then simply go test -v

I'll make this flow easier later today, instead of external repo I'll put inside a fork branch.

Some notes:

  • In my "flat-hack" branch, I added used paths map, same as namesUsed in order to get list of paths passed to flatenner.
  • Haven't check the full correctness of my flatenner - the benchamrks_test.go and my little tests are passing, but hopefully soon in a fork branch I'll be able to get existing tests running (and then passing ;) )
  • The benchmarks I posted above, currently I can't share because they contain internal data, I'll anonymise it and will post the benchmarks as well.

A bit about the source:

Update:

I pushed the sources to my side branch, so it will be easier to use.

  1. Checkout https://github.com/yosiat/quamina/tree/flat-hack
  2. Run go build (to install jx)
  3. To run benchmarks, I prefixed them with Test_JX - so go test -v -run "^Test_JX" will suffice.

Oops… the benchmark we most care about isn't in benchmarks_test (oops, sorry) it's TestCityLots in quamina_test.go. Very strongly dependent on flattener performance.

I have to go do other stuff for a while, will get back to this.

Ok, now it's getting a smaller difference -

=== RUN   TestCityLots
Field matchers: 7 (avg size 1.500, max 3), Value matchers: 6, SmallTables 0 (avg size n/a, max 0), singletons 6

173,434.09 matches/second

--- PASS: TestCityLots (1.64s)
=== RUN   TestCityLots_JX
Field matchers: 7 (avg size 1.500, max 3), Value matchers: 6, SmallTables 0 (avg size n/a, max 0), singletons 6

177,609.63 matches/second

And the profiles are the same, most of the time is on storeArrayElementField.

Yeah, the Patterns applied in that benchmark force the flattener to process the whole record, which includes some large-ish arrays of floating-point numbers, so it's pretty well a worst-case. But actually, there are two distinct problems:

  1. Making the flattener efficient when processing lots of fields
  2. Making the flattener smarter about skipping fields it doesn't need to look at (what you've been working on mostly)

OK, so once again, thanks for this work - once we've landed this PR I'll add you as a project contributor. Now, a confession: You've been doing so much work that I've sort of lost track of which PRs I should review and when. Feel free to send me a message, email or Signal, when you'd like me to take a close look at something.

OK, so once again, thanks for this work - once we've landed this PR I'll add you as a project contributor.

Thanks a lot!

I've sort of lost track of which PRs I should review and when

In terms of PRs, we have:

  • #112 - fixing linter
  • #109 - matchset.addX performance, once the linter fixing is merged I'll rebase and adapt

And then we need to decide on how to proceed with the flattener changes: My approach is clear - track list of paths used and flatten only them, but I am not sure how to proceed here:

  • Tracking list of paths is easy when we are considering additions only, but when we consider removals (of patterns) is becomes complex, because then we need to make sure to remove a path only when it's not used by all patterns. I haven't looked at how deletions work currently, so I have no idea on this.
  • My flattener uses Jx and you said that you don't want an external dependency which makes sense, doing this with existing flattener will be complex, I can try to make it work but it will make some time to get my around existing parser.

So my work around flattener will allow me to use an external one (which is my one) to continue conducting my tests in the internal project and see the e2e performance of it. But I am not sure it's ready for something that we can Merge/Review.

Feel free to send me a message, email or Signal.

Thanks! I'll keep the discussions here in the open in GitHub and if case an arise I'll send you an email (I have sent you an email, so you will have my private as one).

We can probably close this issue, but maybe start another one on flattening, because there is useful discussion in here. I just had another idea: Instead of making NameTracker smarter, we could consider a Flattener implementation that took config options, for example "when you come to this field you can stop, nothing useful after it". Because it sounds like skipping parts of the event is only really useful when you have hand-crafted events and the caller has inside knowledge about the structure.

Closing this issue and suggested opened new for flattening - #113.