Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0

Question

Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0

kimikage opened this issue 5 years ago · comments

I have confirmed that Julia v1.2.0 and v1.3.0 give almost similar results on Normed->Float conversions (#129, #138). However, I found the performance regression (~2x - 3x slower) on x84_64 machines in the following cases:

Vec4{N0f32} -> Vec4{Float32}
Vec4{N0f64} -> Vec4{Float32}
Vec4{N0f64} -> Vec4{Float64}

(cf. #129 (comment))

I'm not going to rush to investigate the cause or fix this problem. I submit this issue as a placeholder in case any useful information is found.

Tim Holy · Answer 1 · Thu Nov 28 2019 04:36:00 GMT+0800 (China Standard Time)

I think those types are very niche. I'm not that worried.

kimikage · Answer 2 · Fri Nov 29 2019 16:43:10 GMT+0800 (China Standard Time)

I agree, but my concern is the cause rather than the result. The investigation may help improve other methods (e.g. Fixed -> Float conversions).

Benchmark

julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Matrix of Vec4 (unit: μs)

w64	`Float32` v1.2.0	`Float32` v1.3.0	`Float64` v1.2.0	`Float64` v1.3.0
`N0f8`	3.814	3.571	4.499	5.725
`N5f3`	3.786	3.457	5.400	5.533
`N0f16`	4.000	3.871	5.100	6.100
`N13f3`	3.800	3.700	4.800	6.333
`N0f32`	4.583	13.599	5.599	6.767
`N8f24`	5.033	4.243	7.800	8.134
`N29f3`	4.933	4.300	6.600	6.367
`N0f64`	13.399	23.000	12.600	21.699
`N61f3`	13.200	12.199	11.400	11.599
`N0f128`	38.800	37.099	35.600	35.200
`N125f3`	44.099	40.199	38.500	40.299

`@code_typed`

julia> Base.VERSION
v"1.2.0"

julia> @code_typed Float32(1N0f32)
CodeInfo(
1 ─       goto #3 if not false
2 ─       nothing::Nothing
3 ┄ %3  = Base.getfield(x, :i)::UInt32
│   %4  = Base.bitcast(Int32, %3)::Int32
│   %5  = Base.lshr_int(%4, 0x0000000000000010)::Int32
│   %6  = Base.shl_int(%4, 0xfffffffffffffff0)::Int32
│   %7  = Base.ifelse(true, %5, %6)::Int32
│   %8  = Base.sitofp(Float32, %7)::Float32
│   %9  = Base.and_int(%4, 65535)::Int32
│   %10 = Base.shl_int(%9, 0x0000000000000008)::Int32
│   %11 = Base.ashr_int(%9, 0xfffffffffffffff8)::Int32
│   %12 = Base.ifelse(true, %10, %11)::Int32
│   %13 = Base.lshr_int(%4, 0x0000000000000018)::Int32
│   %14 = Base.shl_int(%4, 0xffffffffffffffe8)::Int32
│   %15 = Base.ifelse(true, %13, %14)::Int32
│   %16 = Base.or_int(%12, %15)::Int32
│   %17 = Base.sitofp(Float32, %16)::Float32
│   %18 = Base.mul_float(%17, 9.094947f-13)::Float32
│   %19 = Base.muladd_float(%8, 1.5258789f-5, %18)::Float32
└──       return %19
) => Float32

julia> Base.VERSION
v"1.3.0"

julia> @code_typed Float32(1N0f32)
CodeInfo(
1 ─       goto #3 if not false
2 ─       nothing::Nothing
3 ┄ %3  = Base.getfield(x, :i)::UInt32
│   %4  = Base.bitcast(Int32, %3)::Int32
│   %5  = Base.lshr_int(%4, 0x0000000000000010)::Int32
│   %6  = Base.shl_int(%4, 0xfffffffffffffff0)::Int32
│   %7  = Base.ifelse(true, %5, %6)::Int32
│   %8  = Base.sitofp(Float32, %7)::Float32
│   %9  = Base.and_int(%4, 65535)::Int32
│   %10 = Base.sle_int(0, 8)::Bool
│   %11 = Base.bitcast(UInt64, 8)::UInt64
│   %12 = Base.shl_int(%9, %11)::Int32
│   %13 = Base.neg_int(8)::Int64
│   %14 = Base.bitcast(UInt64, %13)::UInt64
│   %15 = Base.ashr_int(%9, %14)::Int32
│   %16 = Base.ifelse(%10, %12, %15)::Int32
│   %17 = Base.lshr_int(%4, 0x0000000000000018)::Int32
│   %18 = Base.shl_int(%4, 0xffffffffffffffe8)::Int32
│   %19 = Base.ifelse(true, %17, %18)::Int32
│   %20 = Base.or_int(%16, %19)::Int32
│   %21 = Base.sitofp(Float32, %20)::Float32
│   %22 = Base.mul_float(%21, 9.094947f-13)::Float32
│   %23 = Base.muladd_float(%8, 1.5258789f-5, %22)::Float32
└──       return %23
) => Float32

Oh gosh...