inetaf / netaddr

Network address types

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

uint128 commit caused across-the-board performance regression

josharian opened this issue · comments

It's particularly noticeable in IPPrefix.Contains, which doubled in execution time.

The problem is that the compiler doesn't SSA arrays, so we generate a whole bunch of useless register moves.

The fix is probably to make the uint128 type a struct with lo and hi members.

cc @bradfitz @danderson

Or y'all could stop golfing? 😜 🤷‍♂️

It's actually starting to worry me. I'd much prefer readable code.

Oh, I thought you were talking about the compiler team for a moment, golfing instead of making arrays SSA-able. :P

I actually think .hi and .lo are clearer than [0] and [1]: I don't have to think about which is the high set of bits.

Yes, I think this package is sufficiently optimized. (Unless/until we have an actual bottleneck that we can trace back to it.)

I like being able to index into the hi and lo halves without branches. Looks cleaner to me.

The one place we use that ability is:

// v6 returns the i'th byte of ip. If ip is an IPv4 address, this
// accesses the IPv4-mapped IPv6 address form of the IP.
func (ip IP) v6(i uint8) uint8 {
	return uint8(ip.addr[(i/8)%2] >> ((7 - i%8) * 8))
}

I can't say I find the branchless indexing here all that clear.

But this is all deck chairs / bike sheds. I'll go think about something else instead.

I personally prefer hi/lo as I'm used to working with big-endian bits and that scans better than an array, but I'm not super fussed either way. I find the branchless indexing hard to reason about, and in fact implementing v6u16 in those terms hurt my brain a bit until I got the math right. That said, if we're golfing... The branchless version makes v6 and v6u16 inlinable :P

Agree on performance, my goal in the last few days has been to stop delegating to stdlib methods, and get performance that's on par with stdlib, or better if achievable without writing asm-in-go. I'm reasonably happy about where the base types are at now, my next focus is adjusting API thing that I think need changing.

I like being able to index into the hi and lo halves without branches. Looks cleaner to me.
The one place we use that ability is:

and it is the only and it is internal

I find the branchless indexing hard to reason about,

I agree, benchmark is the king 🏌️ , compiler often refutes casual reasoning

I think we can get both properties actually with a method that returns [2]*uint64

Will try in hour or two.