Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0
kimikage opened this issue · comments
I have confirmed that Julia v1.2.0 and v1.3.0 give almost similar results on Normed
->Float
conversions (#129, #138). However, I found the performance regression (~2x - 3x slower) on x84_64 machines in the following cases:
Vec4{N0f32}
->Vec4{Float32}
Vec4{N0f64}
->Vec4{Float32}
Vec4{N0f64}
->Vec4{Float64}
(cf. #129 (comment))
I'm not going to rush to investigate the cause or fix this problem. I submit this issue as a placeholder in case any useful information is found.
I think those types are very niche. I'm not that worried.
I agree, but my concern is the cause rather than the result. The investigation may help improve other methods (e.g. Fixed
-> Float
conversions).
Benchmark
julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Matrix of Vec4 (unit: μs)
w64 | Float32 v1.2.0 |
Float32 v1.3.0 |
Float64 v1.2.0 |
Float64 v1.3.0 |
|
---|---|---|---|---|---|
N0f8 |
3.814 | 3.571 | 4.499 | 5.725 | |
N5f3 |
3.786 | 3.457 | 5.400 | 5.533 | |
N0f16 |
4.000 | 3.871 | 5.100 | 6.100 | |
N13f3 |
3.800 | 3.700 | 4.800 | 6.333 | |
N0f32 |
4.583 | 13.599 | 5.599 | 6.767 | |
N8f24 |
5.033 | 4.243 | 7.800 | 8.134 | |
N29f3 |
4.933 | 4.300 | 6.600 | 6.367 | |
N0f64 |
13.399 | 23.000 | 12.600 | 21.699 | |
N61f3 |
13.200 | 12.199 | 11.400 | 11.599 | |
N0f128 |
38.800 | 37.099 | 35.600 | 35.200 | |
N125f3 |
44.099 | 40.199 | 38.500 | 40.299 |
@code_typed
julia> Base.VERSION
v"1.2.0"
julia> @code_typed Float32(1N0f32)
CodeInfo(
1 ─ goto #3 if not false
2 ─ nothing::Nothing
3 ┄ %3 = Base.getfield(x, :i)::UInt32
│ %4 = Base.bitcast(Int32, %3)::Int32
│ %5 = Base.lshr_int(%4, 0x0000000000000010)::Int32
│ %6 = Base.shl_int(%4, 0xfffffffffffffff0)::Int32
│ %7 = Base.ifelse(true, %5, %6)::Int32
│ %8 = Base.sitofp(Float32, %7)::Float32
│ %9 = Base.and_int(%4, 65535)::Int32
│ %10 = Base.shl_int(%9, 0x0000000000000008)::Int32
│ %11 = Base.ashr_int(%9, 0xfffffffffffffff8)::Int32
│ %12 = Base.ifelse(true, %10, %11)::Int32
│ %13 = Base.lshr_int(%4, 0x0000000000000018)::Int32
│ %14 = Base.shl_int(%4, 0xffffffffffffffe8)::Int32
│ %15 = Base.ifelse(true, %13, %14)::Int32
│ %16 = Base.or_int(%12, %15)::Int32
│ %17 = Base.sitofp(Float32, %16)::Float32
│ %18 = Base.mul_float(%17, 9.094947f-13)::Float32
│ %19 = Base.muladd_float(%8, 1.5258789f-5, %18)::Float32
└── return %19
) => Float32
julia> Base.VERSION
v"1.3.0"
julia> @code_typed Float32(1N0f32)
CodeInfo(
1 ─ goto #3 if not false
2 ─ nothing::Nothing
3 ┄ %3 = Base.getfield(x, :i)::UInt32
│ %4 = Base.bitcast(Int32, %3)::Int32
│ %5 = Base.lshr_int(%4, 0x0000000000000010)::Int32
│ %6 = Base.shl_int(%4, 0xfffffffffffffff0)::Int32
│ %7 = Base.ifelse(true, %5, %6)::Int32
│ %8 = Base.sitofp(Float32, %7)::Float32
│ %9 = Base.and_int(%4, 65535)::Int32
│ %10 = Base.sle_int(0, 8)::Bool
│ %11 = Base.bitcast(UInt64, 8)::UInt64
│ %12 = Base.shl_int(%9, %11)::Int32
│ %13 = Base.neg_int(8)::Int64
│ %14 = Base.bitcast(UInt64, %13)::UInt64
│ %15 = Base.ashr_int(%9, %14)::Int32
│ %16 = Base.ifelse(%10, %12, %15)::Int32
│ %17 = Base.lshr_int(%4, 0x0000000000000018)::Int32
│ %18 = Base.shl_int(%4, 0xffffffffffffffe8)::Int32
│ %19 = Base.ifelse(true, %17, %18)::Int32
│ %20 = Base.or_int(%16, %19)::Int32
│ %21 = Base.sitofp(Float32, %20)::Float32
│ %22 = Base.mul_float(%21, 9.094947f-13)::Float32
│ %23 = Base.muladd_float(%8, 1.5258789f-5, %22)::Float32
└── return %23
) => Float32
Oh gosh...