Generate data with zero allocs

Question

Generate data with zero allocs

matheusd opened this issue 4 months ago · comments

Matheus Degiovani commented 4 months ago

How can I generate/encode data with zero heap allocations?

Example use case: I want to generate data to write into a file, so I need to allocate a single structure, message and arena. I can pre-allocate the arena buffer. But all my attempts at writing a loop fail to make it zero alloc (either the arena grows or message escapes to heap).

Here's the set of test benchmarks: https://github.com/matheusd/capnptest01/blob/master/bench_test.go

And here are my results:

BenchmarkSetText01-7     6966074               174.9 ns/op            99 B/op          0 allocs/op
BenchmarkSetText02-7     1000000              1100 ns/op             352 B/op          5 allocs/op
BenchmarkSetText03-7     2315973               517.9 ns/op            72 B/op          2 allocs/op
BenchmarkSetText04-7     2661026               429.9 ns/op           260 B/op          0 allocs/op

While benchmarks 01 and 04 are shown with 0 allocs, that's actually just the amortized growth of the arena which ends up becoming large in those tests.

Louis Thibault · Answer 1 · Sat Mar 02 2024 02:35:57 GMT+0800 (China Standard Time)

If you call Release() on your message, the underlying buffers are returned to a sync.Pool. This is the recommended way to avoid allocations. I am not certain that there is a way to guarantee zero-allocation (un)marshalling, though the Arena API is designed to support it in principle.

Matheus Degiovani · Answer 2 · Sat Mar 02 2024 19:35:31 GMT+0800 (China Standard Time)

AFAICT, my BenchmarkTest03() was supposed to do that (through the use of msg.Reset(arena)) and indeed, the arena itself does not grow, but some objects (looks to be the *Segment created inside Message.setSegment() ) still escape to the heap.

func BenchmarkSetText03(b *testing.B) {
	var msg capnp.Message
	arena := capnp.SingleSegment(nil)

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		seg, err := msg.Reset(arena)
		if err != nil {
			b.Fatal(err)
		}

		tx, err := NewTransaction(seg)
		if err != nil {
			b.Fatal(err)
		}

		err = tx.SetDescription("my own descr")
		if err != nil {
			b.Fatal(err)
		}
	}

	// b.Log(arena.String())
}

Matheus Degiovani · Answer 3 · Sat Mar 02 2024 20:08:02 GMT+0800 (China Standard Time)

I notice that API wise, msg.Reset() has an issue, where I can't be completely in control of the arena. msg.Reset() forcibly calls arena.Release which puts the arena back to the sync pool, even if I know that I'll reuse the arena without necessitating going through the arena sync pool.

This is easily solved by instantiating my own arena type that NOPs on Release().

Matheus Degiovani · Answer 4 · Sat Mar 02 2024 20:49:42 GMT+0800 (China Standard Time)

Ok, so the first escape to heap is caused because the second time Message.Segment() is executed, m.segs != nil but len(m.segs) == 0, meaning the first segment (which is statically allocated) has been removed from the segs map, causing a new Segment to be initialized. The following diff fixes this issue:

diff --git a/message.go b/message.go
index 5ae7f3e..4bcceaa 100644
--- a/message.go
+++ b/message.go
@@ -100,6 +100,9 @@ func (m *Message) Release() {
 func (m *Message) Reset(arena Arena) (first *Segment, err error) {
 	m.capTable.Reset()
 	for k := range m.segs {
+		if k == 0 && m.segs[k] == &m.firstSeg {
+			continue
+		}
 		delete(m.segs, k)
 	}
 
@@ -113,6 +116,7 @@ func (m *Message) Reset(arena Arena) (first *Segment, err error) {
 		DepthLimit:    m.DepthLimit,
 		capTable:      m.capTable,
 		segs:          m.segs,
+		firstSeg:      Segment{msg: m},
 	}
 
 	if arena != nil {

I noticed that Message.allocSegment requires m.segs to be non-nil, so it seems to me that Reset should just directly create m.segs if it is nil and populate it with the first seg, removing the need to consider the two cases throughout the code and make it easier to reason about. What do you think?

If the previous diff is reasonable, I can send it as a proper PR.

Matheus Degiovani · Answer 5 · Sat Mar 02 2024 21:37:03 GMT+0800 (China Standard Time)

Ok, to fix the second escape to heap I had to create a new Arena that does not use the bufferpool to manage its memory: https://github.com/matheusd/capnptest01/blob/fd2e71a57e5c1ccf8a42f28804ba8f85129002d9/arena.go#L35

This makes it possible to encode messages without any heap allocations (other than the initial buffer if correctly sized).

Is ManualSegmentArena is something useful to be contributed to go-capnp?

Louis Thibault · Answer 6 · Mon Mar 04 2024 01:23:43 GMT+0800 (China Standard Time)

@matheusd Both of your suggestions seem sensible to me, and a PR is most welcome. Please be aware that I am currently traveling, so I will be slower to respond/review than usual.

Looking forward to the PR :)

Matheus Degiovani · Answer 7 · Wed Mar 13 2024 04:33:13 GMT+0800 (China Standard Time)

(No rush to respond, I'm aware you're travelling)

I've sent the first PR to address this (#555).

I previously wrote:

I noticed that Message.allocSegment requires m.segs to be non-nil, so it seems to me that Reset should just directly create m.segs if it is nil and populate it with the first seg, removing the need to consider the two cases throughout the code and make it easier to reason about

But the more I think about it, looking at a cpu profile with the heap allocs fixed, the more I think that segs should really be a function of (and therefore moved to) the Area implementations.

In particular, this would allow implementing an arena that forgoes the Message.mu protection, getting a performance boost at the expense of the caller having to ensure no concurrent access to the same message.

For example, in the following profile of my benchmark, Reset() is over 50% of the time while it's basically useless (because the size of the arena is accounted for, only a single segment will ever be used and access is all done in a single goroutine).

Rewriting to use an Arena that does not use the map and mutex would make Reset() essentially free.

Louis Thibault · Answer 8 · Sat Mar 16 2024 22:22:13 GMT+0800 (China Standard Time)

Hey @matheusd Thanks for investigating this. Your proposal sounds reasonable. I don't think this will break any existing behavior ... wanna give it a shot?

Matheus Degiovani · Answer 9 · Mon Mar 18 2024 18:36:26 GMT+0800 (China Standard Time)

Yeah, I've started work on this. I only work on this as time permits, so no hard deadline on when I'll submit, but I think I've got a rough high level design going already.

Thank you for the support!

Louis Thibault · Answer 10 · Tue Mar 19 2024 02:28:48 GMT+0800 (China Standard Time)

Absolutely no rush! I'm very, very glad you're able to contribute some cycles 😃
Please let me know if I can be of help!