Featurizer transformations with Image
sonic182 opened this issue · comments
Hi! I have been trying to do the same images transformations that I do in python with huggingface transformers library and the Blip2ImageProcessor class (to feed a blip2 model -> https://huggingface.co/Salesforce/blip2-opt-2.7b).
For now I have:
defmodule Whatever do
def images_to_input(img_links, featurizer) do
img_links
# featurizer.size is {h, w} tuple
|> Enum.reduce([], &download_and_prepare_img(&1, &2, featurizer.size))
|> then(fn images ->
# required for more transformations, like normalization, ...
Bumblebee.apply_featurizer(featurizer, images)
end)
end
# by doing resize with Image package, we avoid using image_nx for this step
defp prepare_image(img_bin, {h, w}) do
size = "#{h}x#{w}"
img = Image.from_binary!(img_bin)
colorspace = Image.colorspace(img)
img
|> Image.thumbnail!(size, fit: :fill)
|> then(fn img ->
if colorspace not in [:rgb, :srgb] do
Image.to_colorspace!(img, :rgb)
else
img
end
end)
|> Image.split_alpha()
|> then(fn {image, _alpha} ->
Image.to_nx!(image)
end)
end
end
The thing is that there are some more transformatinos using the Bumblebee featurizer, also with python of course. That I don't know how to correctly do with Image package (or vix)
Would be nice this package to have implementations for this Image processors that are in python, maybe to load the config file from huggingface and to have an instance for this transformations
I'm certain I'm not fully understanding what you're after but I'm definitely happy to collaborate on any quality-of-life improvements.
A couple of things I can suggest (that aren't maybe exactly what you're asking):
Image.flatten/1
is probably better than splitting an alpha band out.- Converting to
:srgb
colorspace probable makes sense in all cases since:rgb
is generic. That is, no assumptions about gamma or color primaries. I believe most imaging tools expect:srgb
as the colorspace. - In
libvips
, the colorspace and the data type are orthogonal. If you're aiming to replicate some of the transforms in the Bumblebee link you provided, the data should be cast. - For thumbnailing, I'm not sure
fit: :fill
will be the best option its the stretch/compress cycle may distort the image such that the it makes features unrecognisable. I wonder if you'd get better and more consistent results withfit: :cover
? - The
"WxH"
format for thumb nailing is a bit of a compatibility hack. Its more appropriate to usethumbnail(image, width, height: height)
which also avoids string interpolation.
Example (untested)
Applying the thoughts above, I wonder if the following would get you closer to a the Bumblebee.Vision.BlipFeaturizer.process_input/2
call:
defp prepare_image(binary, {h, w}) do
binary
|> Image.from_binary!()
|> Image.flatten!()
|> Image.thumbnail!(w, height: h, fit: :cover)
|> Image.to_colorspace!(:srgb)
|> Image.cast!(:f32)
|> Image.to_nx!()
end
I think that just leaves the normalize_channels(length(featurizer.image_mean))
and normalize_size(featurizer.size)
functions that I haven't looked at yet to see what they actually do - and whether it makes sense for this library to have equivalents.
It's not my intention that Image
should replace or be an alternative to Bumblee (or other parts of Nx
). The primary goal is good interoperability. But if there are transforms that make sense, in a more generic way, for Image
I'm more than happy to implement them.
I think Bumblebee.Utils.Image.normalize_channels/2 just makes sure its a 3-band image so probably not required in Image
since converting to the :srgb
colorspace already makes that guarantee.
Similarly, Bumblebee.Utils.Image.normalize_size/2
isn't required as best I can tell.
Thanks for your suggestions
I found that I did speed up my workflow by doing resize of images before other transformations like normalization