Optimizing `Fixed` --> `Float` conversions
kimikage opened this issue · comments
See: #129 (comment)
Although this has been postponed, this is a issue on conversion, not arithmetic. So, it might be better to include this in the next release (v0.8.0).
(::Type{Tf})(x::Fixed{T,f}) where {Tf <: AbstractFloat, T, f} = Tf(Tf(x.i) * Tf(@exp2(-f)))
Base.Float16(x::Fixed{T,f}) where {T, f} = Float16(Float32(x))
Base.Float32(x::Fixed{T,f}) where {T, f} = Float32(x.i) * Float32(@exp2(-f))
Base.Float64(x::Fixed{T,f}) where {T, f} = Float64(x.i) * @exp2(-f)
Benchmark
There seems to be no significant difference between Julia versions or between operating systems.
There is a slowdown in converting Vec3{Fixed{Int16}}
arrays to Vec3{Float32}
arrays, but this is mainly a problem with the LLVM backend.
Script
using BenchmarkTools
using FixedPointNumbers
struct Vec3{T <: Real}
x::T; y::T; z::T
end
struct Vec4{T <: Real}
x::T; y::T; z::T; w::T
end
Vec3{T}(v::Vec3{T}) where {T} = v
Vec3{T}(v::Vec3{U}) where {T, U} = Vec3{T}(v.x, v.y, v.z)
Vec4{T}(v::Vec4{T}) where {T} = v
Vec4{T}(v::Vec4{U}) where {T, U} = Vec4{T}(v.x, v.y, v.z, v.w)
Base.rand(::Type{Vec3{T}}) where {T} = Vec3{T}(rand(T), rand(T), rand(T))
Base.rand(::Type{Vec4{T}}) where {T} = Vec4{T}(rand(T), rand(T), rand(T), rand(T))
function Base.rand(::Type{T}, sz::Dims) where {T <: Union{Vec3, Vec4}}
A = Array{T}(undef, sz)
for i in eachindex(A); A[i] = rand(T); end
return A
end
Ts = (Q0f7, Q4f3, Q0f15, Q12f3, Q0f31, Q28f3, Q0f63, Q60f3)
mat3s = [rand(Vec3{T}, 64, 64) for T in Ts];
mat4s = [rand(Vec4{T}, 64, 64) for T in Ts];
for mat in mat3s
println(eltype(mat), "-> Float32")
@btime Vec3{Float32}.(view($mat,:,:))
end
for mat in mat3s
println(eltype(mat), "-> Float64")
@btime Vec3{Float64}.(view($mat,:,:))
end
for mat in mat4s
println(eltype(mat), "-> Float32")
@btime Vec4{Float32}.(view($mat,:,:))
end
for mat in mat4s
println(eltype(mat), "-> Float64")
@btime Vec4{Float64}.(view($mat,:,:))
end
Julia v1.3.1 x86_64-w64-mingw32
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Matrix of Vec3 (unit: μs)
w64 | Float32 master |
Float32 optimized |
Float64 master |
Float64 optimized |
|
---|---|---|---|---|---|
Q0f7 |
3.943 | 3.350 | 10.700 | 5.575 | |
Q4f3 |
3.886 | 3.362 | 10.900 | 5.420 | |
Q0f15 |
4.043 | 4.850 | 11.100 | 5.480 | |
Q12f3 |
4.057 | 4.850 | 11.001 | 5.840 | |
Q0f31 |
5.960 | 5.000 | 9.199 | 5.520 | |
Q28f3 |
6.000 | 5.000 | 9.199 | 5.640 | |
Q0f63 |
30.600 | 5.183 | 48.299 | 6.150 | |
Q60f3 |
25.799 | 5.617 | 47.401 | 6.150 |
Matrix of Vec4 (unit: μs)
w64 | Float32 master |
Float32 optimized |
Float64 master |
Float64 optimized |
|
---|---|---|---|---|---|
Q0f7 |
16.399 | 3.643 | 17.199 | 5.700 | |
Q4f3 |
15.500 | 3.614 | 17.200 | 6.100 | |
Q0f15 |
15.100 | 3.643 | 17.100 | 5.801 | |
Q12f3 |
15.000 | 3.750 | 20.699 | 6.200 | |
Q0f31 |
13.899 | 3.783 | 17.001 | 5.499 | |
Q28f3 |
13.800 | 3.743 | 17.199 | 7.133 | |
Q0f63 |
57.300 | 13.499 | 65.000 | 10.800 | |
Q60f3 |
60.800 | 13.699 | 66.699 | 10.600 |
Julia v1.0.5 x86_64-pc-linux-gnu on WSL
julia> versioninfo()
Julia Version 1.0.5
Commit 3af96bcefc (2019-09-09 19:06 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Matrix of Vec3 (unit: μs)
linux | Float32 master |
Float32 optimized |
Float64 master |
Float64 optimized |
|
---|---|---|---|---|---|
Q0f7 |
3.914 | 3.362 | 10.900 | 5.400 | |
Q4f3 |
4.000 | 3.425 | 11.300 | 5.620 | |
Q0f15 |
4.057 | 5.033 | 11.800 | 5.800 | |
Q12f3 |
4.057 | 5.050 | 11.700 | 5.875 | |
Q0f31 |
6.100 | 5.133 | 9.800 | 5.640 | |
Q28f3 |
6.120 | 5.150 | 9.600 | 5.700 | |
Q0f63 |
48.700 | 5.450 | 51.900 | 6.180 | |
Q60f3 |
44.400 | 5.417 | 46.900 | 6.375 |
Matrix of Vec4 (unit: μs)
linux | Float32 master |
Float32 optimized |
Float64 master |
Float64 optimized |
|
---|---|---|---|---|---|
Q0f7 |
14.200 | 3.471 | 16.500 | 6.467 | |
Q4f3 |
14.700 | 3.414 | 16.700 | 6.633 | |
Q0f15 |
14.000 | 3.614 | 16.100 | 6.667 | |
Q12f3 |
13.500 | 3.557 | 16.300 | 5.967 | |
Q0f31 |
12.300 | 3.486 | 14.400 | 7.000 | |
Q28f3 |
12.200 | 3.543 | 14.300 | 7.300 | |
Q0f63 |
62.900 | 14.300 | 64.400 | 11.400 | |
Q60f3 |
56.500 | 14.200 | 56.500 | 11.600 |