JuliaMath / FixedPointNumbers.jl

fixed point types for julia

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

performance on clamp

johnnychen94 opened this issue · comments

This is expected, but I think it can be optimized.

julia> @btime clamp!($(rand(N0f8, 512, 512)), 0.1, 0.9);
  813.628 μs (0 allocations: 0 bytes)

julia> @btime clamp!($(rand(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  628.583 μs (0 allocations: 0 bytes)

julia> @btime clamp.($(rand(N0f8, 512, 512)), 0.1, 0.9);
  691.843 μs (3 allocations: 2.00 MiB)

julia> @btime clamp.($(rand(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  1.146 ms (3 allocations: 256.19 KiB)

Ref: observed in JuliaImages/ImageContrastAdjustment.jl#28 (comment)

I don't understand what the benchmarks above are intended for.:confused:

Is this what you mean?

julia> @btime clamp!($(zeros(N0f8, 512, 512)), 0.1, 0.9);
  505.900 μs (0 allocations: 0 bytes)

julia> @btime clamp!($(zeros(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  114.299 μs (0 allocations: 0 bytes)

julia> @btime clamp.($(zeros(N0f8, 512, 512)), 0.1, 0.9);
  687.300 μs (2 allocations: 2.00 MiB)

julia> @btime clamp.($(zeros(N0f8, 512, 512)), 0.1N0f8, 0.9N0f8);
  179.200 μs (2 allocations: 256.14 KiB)

Yes, I also think it can be optimized (specialized) as follows:

Base.clamp(x::X, lo::X, hi::X) where X<:FixedPoint = X(clamp(x.i, lo.i, hi.i), 0)

However, the bottleneck should be identified first. I don't think floating-point conversions are "too" slow. They are inherently expensive.

I was thinking a specification like this:

Base.clamp(x::X, lo, hi) where X<:FixedPoint = clamp(x, X(lo), X(hi))

I opened this as a potential issue tracker and haven't spent some time playing with it, feel free to close it if you think this makes non-sense.

clamp! can be optimized, but I don't want to do it because there are many "in-place" variants.
clamp must not be changed as you suggested. FixedPoint is not a pixel type (e.g. Gray{N0f8}).

feel free to close it if you think this makes non-sense.

This issue clearly makes sense, but I don't know what you want.

TBH, I had believed that the optimization for clamp(x::X, lo::X, hi::X) where X<:FixedPoint was done by the compiler (LLVM backend). I think it is worth introducing.:rocket:

Oh, I wrote this done as a quick note to myself and I didn't spend enough time playing with it, I'm sorry I didn't make it clear.

This is expected, but I think it can be optimized.

I was meant to say "I believe this performance gap is expected, but I think there're merits to proactively convert lo and hi to FixedPoint type because I believe the general philosophy is to not care about storage type since it's hard to intuit the types in practice, and when we try to tweak performance by converting/promoting types, there's a code smell to me.

I'm in a bit of a hurry catching up with my own ddls, I'll post the update later if I find anything worth optimizing.

clamp! can be optimized, but I don't want to do it because there are many "in-place" variants.

I don't quite understand the reasoning behind this, can you clarify it a bit? Thanks. My understanding is that in-place functions in Base are worth extending.

clamp! can be optimized, but I don't want to do it because there are many "in-place" variants.

I don't quite understand the reasoning behind this, can you clarify it a bit?

The specialization of many "in-place" methods requires much labor and offers little benefit.

I believe the general philosophy is to not care about storage type

I agree with you. Therefore, I think clamp should be used when we can use clamp instead of clamp!. When we use clamp!, we must "definitely" pay attention to the storage type! The performance issues above are the result of "carelessness".:confused:

Also, if we follow the philosophy, clamp MUST NOT convert the "return" type.

Certainly the specialization in #179 (comment) makes sense. The one in #179 (comment) is much more dangerous, since currently clamp(x, -Inf, Inf) succeeds for all FixedPoint values but that definition would throw an error since Inf does not have a representation as FixedPoint. We could, however, define clamp_fast and make @fastmath do semantic replacement (this would presumably require a PR to Base). It might take a little work to get it to hoist the conversion for broadcasting, though.

Cases like @fastmath clamp.(A * B, lo, hi) would be trouble, though, if A and B don't have the same element type. This could get complex pretty quickly. The most robust approach is to pass lo and hi of the correct type manually.

I merged PR #194. If we try to improve it further I will keep this issue open, otherwise I will close this issue.

Thanks for working on this. #194 looks like a good-enough improvement to me.