generalize the ColorMixture concept

Question

generalize the ColorMixture concept

johnnychen94 opened this issue 2 years ago · comments

IIUC this package provides a pixel version of the deprecated ColorizedArray. Now I can see that this is a useful concept in fluorophore imaging. Compared to defining the Color subtypes manually for every combination, the benefit of this ColorMixture concept is to provide a user-friendly interface to design a new color type, which is great.

But fluorophore image is not the only field that has this concept; this can be generalized to hyperspectral imaging where there can be as many channels as the hardware supports.

Thus I'm wondering if we can 1) separate this concept into a standalone package, e.g., MixedColors.jl, and, 2) generalize this concept and allow the user library (e.g., FluorophoreColors) to define its domain-specific color types.

A draft type design for this can be:

MixedColors.jl

struct MixedColorInfo{CS, F}
    cs::CS
    to_rgb::F
end

struct MixedColor{T,N,INFO} <: Color{T,N}
    channels::NTuple{N,T}
end

User library:

# FluorophoreColors.jl
_to_rgb(channels, cs) = sum(map(*, channels, cs))
const _FRGB_INFO = MixedColorInfo(cs, _to_rgb)
const FRGB{T,N} = MixedColor{T,N,_FRGB_INFO}

# with other Fluorophore-specific definitions, e.g., loading the definition from csv file.
...

I believe the parametrization trick on values Cs (instead of types typeof(Cs)) works because the types of fluorephore colors are limited and RGB{N0f8} is internally tuple of integers. Thus here I extend it to a struct, but I'm not sure if parametrization on the function to_rgb works reliably; I guess it works because we also have things like map(::typeof(+), ...) usage.

Tim Holy · Answer 1 · Wed May 04 2022 17:02:08 GMT+0800 (China Standard Time)

I like the idea of generalizing it to hyperspectral imaging. I think we could still call the package MixedColors while also providing lookup tables for fluorophores: it's just two fairly small CSV files, and I think creating a standalone package just for the constants isn't worth it.

I also like your design (as usual) separating the color info from the instance. From the standpoint of pixel storage, the two should be equivalent (same amount of memory, both reinterpretable, etc), which is exactly what we need. Let's try a demo to check viability:

julia> using FluorophoreColors

julia> cs = (fluorophore_rgb["EGFP"], fluorophore_rgb["tdTomato"])
(RGB{N0f8}(0.0,0.925,0.365), RGB{N0f8}(1.0,0.859,0.0))

julia> struct MixedColorInfo{CS, F}
           cs::CS
           to_rgb::F
       end

julia> struct MixedColor{T,N,INFO} <: Color{T,N}
           channels::NTuple{N,T}
       end

julia> @inline _to_rgb(channels, cs) = sum(map(*, channels, cs))
_to_rgb (generic function with 1 method)

julia> const _FRGB_INFO = MixedColorInfo(cs, _to_rgb)
MixedColorInfo{Tuple{RGB{N0f8}, RGB{N0f8}}, typeof(_to_rgb)}((RGB{N0f8}(0.0,0.925,0.365), RGB{N0f8}(1.0,0.859,0.0)), _to_rgb)

julia> const FRGB{T,N} = MixedColor{T,N,_FRGB_INFO}
FRGB{T, N} where {T, N} (alias for MixedColor{T, N, MixedColorInfo{Tuple{RGB{Normed{UInt8, 8}}, RGB{Normed{UInt8, 8}}}, typeof(_to_rgb)}((RGB{N0f8}(0.0,0.925,0.365), RGB{N0f8}(1.0,0.859,0.0)), _to_rgb)} where {T, N})

julia> @inline Base.convert(::Type{RGB}, c::MixedColor{T,N,INFO}) where {T,N,INFO} = INFO.to_rgb(c.channels, INFO.cs)

julia> c = FRGB{N0f16,2}((0.2, 0.8));

julia> sizeof(c)
4

julia> convert(RGB, c)
RGB{N0f16}(0.8,0.87214,0.07294)

julia> f(c) = convert(RGB, c)
f (generic function with 1 method)

and to compare with the current implementation of this package:

julia> ctemplate = ColorMixture{N0f16}(cs);

julia> c2 = ctemplate(c.channels...)
(0.2N0f16₁, 0.8N0f16₂)

julia> f(c2) === f(c)
true

All good so far. But there's a small performance gap that I don't yet understand:

julia> using BenchmarkTools

julia> @btime f($c)
  2.957 ns (0 allocations: 0 bytes)
RGB{N0f16}(0.8,0.87214,0.07294)

julia> @btime f($c2)
  2.218 ns (0 allocations: 0 bytes)
RGB{N0f16}(0.8,0.87214,0.07294)

What's odd is that in general terms the code_typed of the two are very, very similar: AFAICT the only differences are small changes in the ordering of certain operations. (You can easily compare the two with open(filename, "w") do io; print(io, @code_typed f(args...)); end and then using a graphical diff tool like meld to visualize the differences.) One observation is that without the @inlines above the gap is much larger because the type-inferred code still has some invokes in it. So perhaps it's a limitation of the optimizer under forced inlining?

In the end of the day I think the functionality of your proposal merits the change, but it would be lovely to eliminate this performance gap.

Johnny Chen · Answer 2 · Wed May 04 2022 19:56:22 GMT+0800 (China Standard Time)

Most such image representation uses the struct of array layout because the relationship between different channels are usually weak. If we consider this fact, an optimized algorithm can just provide a specialization on the StructArray (or related trait types) as JuliaImages/ImageCore.jl#180

Thus in most cases, this pixel-wise conversion only serves the "making the fallback generic solution work and work quite okay" purpose. For this reason, I won't be too worried about the performance. I could be wrong, of course.

Tim Holy · Answer 3 · Thu May 05 2022 01:44:10 GMT+0800 (China Standard Time)

this pixel-wise conversion only serves the "making the fallback generic solution work and work quite okay" purpose

This makes me rethink: how much flexibility do we need, and how much do we want? I'm beginning to return to thinking that perhaps my slightly less-flexible design might be better, although I don't think there's a really strong pull one way or another.

First, keep in mind that this package already supports hyperspectral imaging, it just lacks the convenient name => RGB mapping for "named" fluorophores. But you can construct ctemplate with however many colors you need. Definitely agreed we should broaden the name of the package to reflect this reality.

Second and more fundamentally, the only real restriction here, compared to the alternative you proposed, is that my implementation only supports RGB conversions of the form sum(w .* cs) where cs is a list of RGB colors, whereas your implementation allows an arbitrary f. However, folks can get that with MappedArrays, by supplying an f(c) -> cnew that generates a new color from c. One advantage of not having that flexibility here is that we could display a nice summary of such an array. Currently we have

julia> A = [(rand(N0f8), rand(N0f8)) for i = 1:3, j = 1:4]
3×4 Matrix{Tuple{N0f8, N0f8}}:
 (0.576, 0.345)  (1.0, 0.925)    (0.757, 0.843)  (0.647, 0.671)
 (0.871, 0.282)  (0.18, 0.259)   (0.204, 0.988)  (0.043, 0.816)
 (0.733, 0.078)  (0.349, 0.439)  (0.412, 0.514)  (0.408, 0.757)

julia> Ac = ctemplate.(A)
3×4 Array{ColorMixture{N0f16},2} with eltype ColorMixture{N0f16, 2, (RGB{N0f16}(0.0,0.92549,0.36471), RGB{N0f16}(1.0,0.85882,0.0))}:
 (0.57647₁, 0.3451₂)   (1.0₁, 0.92549₂)      …  (0.64706₁, 0.67059₂)
 (0.87059₁, 0.28235₂)  (0.18039₁, 0.25882₂)     (0.04314₁, 0.81569₂)
 (0.73333₁, 0.07843₂)  (0.34902₁, 0.43922₂)     (0.40784₁, 0.75686₂)

and the summary line is a mouthful. We can customize it with Base.showarg but the key question is what to do with the color values. A cute and actually useful form of summary might be something like this:

The nice thing is that it makes it clear that the first element is a green color channel (EGFP) and the second a longer-wavelength reporter. While it's far from a must-have, this kind of nice summary would be difficult to achieve without having a sense of per-colorchannel colors. I know yours lists the colors in cs, but fundamentally I think your proposal is to allow an arbitrary color-synthesis function.

Johnny Chen · Answer 4 · Thu May 05 2022 02:47:38 GMT+0800 (China Standard Time)

Second and more fundamentally, the only real restriction here, compared to the alternative you proposed, is that my implementation only supports RGB conversions of the form sum(w .* cs) where cs is a list of RGB colors, whereas your implementation allows an arbitrary f. However, folks can get that with MappedArrays, by supplying an f(c) -> cnew that generates a new color from c

This is a bit like the argument why I prefer to use StructArray over colorview in JuliaImages/ImageCore.jl#179. My argument here is that if we construct the pixel value with StructArray, then it is easy to dispatch on it:

using ImageCore, StructArrays, FluorophoreColors
channels = (fluorophore_rgb["EGFP"], fluorophore_rgb["tdTomato"])
ctemplate = ColorMixture{N0f16}(channels)

img = reshape([ctemplate(rand(), rand()) for _ in 1:64], 8, 8);
data = collect(channelview(img))
img_soa = StructArray{eltype(img)}(data; dims=1)

to dispatch on it:

function f(img::AbstractArray{T}) where {T<:ColorMixture}
...
end

# optimized implementation for SOA layout
function f(img::StructArray{T}) where {T<:ColorMixture}
...
end

For this stuff, it's quite unclear what we should do with MappedArray. I believe this is also the reason that makes you create this package.

One advantage of not having that flexibility here is that we could display a nice summary of such an array.

~~I think this depends on how we dispatch the Base.show method: should we dispatch on Base.show(::IO, ::ColorMixture), or Base.show(::IO, ::ColorMixture{T,N,FRGB}) where {T,N}?~~

Oh, I see, because FRGB is a value so we need a helper interface to do the dispatch:

Base.show(io::IO, c::ColorMixture{T,N,info) where {T,N,info} = show_color(info, io, c)

function show_color(info::MixedColorInfo{CS, typeof(fluorophore_to_rgb)}, io::IO, c::ColorMixture) where {CS}
    ...
end

Tim Holy · Answer 5 · Thu May 05 2022 18:20:07 GMT+0800 (China Standard Time)

I guess my main point is that taking your idea even further, you don't really need the cs field (it could be hard-coded into f, or captured in a closure). I think the core of your proposal is to allow a generic mapping from a multi-valued color to an RGB color. Like MappedArrays, the flexibility is both a benefit (you can do whatever you want) and a disadvantage (the lack of standardization can make it harder to do other automated things). Dispatch may allow one to have the flexibility without losing functionality elsewhere, of course. So I would cast this decision as "minor" since I think we can get what we want either way.

I am already foreseeing one generalization that will be needed: in ImageView, each independent channel will need its own scaling before multiplying by the channel-assigned color, i.e. sum(w .* g .* cs), where g is a tuple of "gains" that are extracted from an Observable. This is because most biological imaging these days is 16-bit, and this dynamic range is high enough that scaling to appropriate contrast ranges for computer monitors is required before sending to the display. However, g is constant across the entire image plane.