nholthaus / units

a compile-time, header-only, dimensional analysis and unit conversion library built on c++14 with no dependencies.

Home Page:http://nholthaus.github.io/units/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unit conversion emits a surprisingly high amount of instructions

chiphogg opened this issue · comments

Consider this round-trip unit conversion, using both the nholthaus library and Au.

double nholthaus_round_trip(double x) {
    return ::units::angle::degree_t{::units::angle::radian_t{::units::angle::degree_t{x}}}.value();
}

double au_round_trip(double x) {
    return au::degrees(x).as(au::radians).in(au::degrees);
}

I expected these to emit basically equivalent code, but was surprised to see a huge difference in the amount of instructions. This translates into an actual runtime performance penalty, which is likely avoidable. (That said: I highly doubt that unit conversions should ever occur in the "hot loop" of a well designed program, so this is probably not a meaningful performance penalty.)

Here's a godbolt link using clang 16.0.0.

For nholthaus, we see two things. First, that it's multiplying and dividing by pi and 180, instead of combining them into a single factor pi / 180 at compile time. Second, that it emits a surprisingly large number of instructions that I can't explain (I'm not well versed in assembly):

image

For Au, we can see that the factors are combined into one (we see ~57.3, and its inverse). And we emit only two instructions:

image

Here's the godbolt link for gcc 13.2. I didn't use this one first because the other one actually generates comments to show you what values are being used, which is nice.

Anyway, we can see the nholthaus code looks much more reasonable for gcc than for clang, although it still emits more instructions than Au. Here it is:

image

And here's what we get for Au (still just two instructions):

image


What's the upshot? I guess it would be nice to consider combining the conversion factors into a single value, computed at compile time. This doc on applying conversion factors may be useful reading here.

I'm also curious why clang emits so much more code than gcc does, but I assume if we switched to a single conversion factor then this would all go away and the point would be moot. (Although it'd be interesting if we found that it didn't!)


Please include the following information in your issue:

  1. Which version of units you are using

The current master.

  1. Which compiler exhibited the problem (including compiler version)

clang 16.0.0 and gcc 13.2