Featurizer transformations with Image

Question

Featurizer transformations with Image

sonic182 opened this issue 5 months ago · comments

Johanderson Mogollon commented 5 months ago

Hi! I have been trying to do the same images transformations that I do in python with huggingface transformers library and the Blip2ImageProcessor class (to feed a blip2 model -> https://huggingface.co/Salesforce/blip2-opt-2.7b).

For now I have:

defmodule Whatever do

  def images_to_input(img_links, featurizer) do
    img_links
    # featurizer.size is {h, w} tuple
    |> Enum.reduce([], &download_and_prepare_img(&1, &2, featurizer.size))
    |> then(fn images ->
      # required for more transformations, like normalization, ...
      Bumblebee.apply_featurizer(featurizer, images)
    end)
  end
  
  # by doing resize with Image package, we avoid using image_nx for this step
  defp prepare_image(img_bin, {h, w}) do
    size = "#{h}x#{w}"
    img = Image.from_binary!(img_bin)
    colorspace = Image.colorspace(img)

    img
    |> Image.thumbnail!(size, fit: :fill)
    |> then(fn img ->
      if colorspace not in [:rgb, :srgb] do
        Image.to_colorspace!(img, :rgb)
      else
        img
      end
    end)
    |> Image.split_alpha()
    |> then(fn {image, _alpha} ->
      Image.to_nx!(image)
    end)
  end
end

The thing is that there are some more transformatinos using the Bumblebee featurizer, also with python of course. That I don't know how to correctly do with Image package (or vix)

Would be nice this package to have implementations for this Image processors that are in python, maybe to load the config file from huggingface and to have an instance for this transformations

Kip Cole · Answer 1 · Tue Jan 30 2024 05:29:18 GMT+0800 (China Standard Time)

I'm certain I'm not fully understanding what you're after but I'm definitely happy to collaborate on any quality-of-life improvements.

A couple of things I can suggest (that aren't maybe exactly what you're asking):

Image.flatten/1 is probably better than splitting an alpha band out.
Converting to :srgb colorspace probable makes sense in all cases since :rgb is generic. That is, no assumptions about gamma or color primaries. I believe most imaging tools expect :srgb as the colorspace.
In libvips, the colorspace and the data type are orthogonal. If you're aiming to replicate some of the transforms in the Bumblebee link you provided, the data should be cast.
For thumbnailing, I'm not sure fit: :fill will be the best option its the stretch/compress cycle may distort the image such that the it makes features unrecognisable. I wonder if you'd get better and more consistent results with fit: :cover?
The "WxH" format for thumb nailing is a bit of a compatibility hack. Its more appropriate to use thumbnail(image, width, height: height) which also avoids string interpolation.

Example (untested)

Applying the thoughts above, I wonder if the following would get you closer to a the Bumblebee.Vision.BlipFeaturizer.process_input/2 call:

defp prepare_image(binary, {h, w}) do
  binary
  |> Image.from_binary!()
  |> Image.flatten!()
  |> Image.thumbnail!(w, height: h, fit: :cover)
  |> Image.to_colorspace!(:srgb)
  |> Image.cast!(:f32)
  |> Image.to_nx!()
end

I think that just leaves the normalize_channels(length(featurizer.image_mean)) and normalize_size(featurizer.size) functions that I haven't looked at yet to see what they actually do - and whether it makes sense for this library to have equivalents.

It's not my intention that Image should replace or be an alternative to Bumblee (or other parts of Nx). The primary goal is good interoperability. But if there are transforms that make sense, in a more generic way, for Image I'm more than happy to implement them.

Kip Cole · Answer 2 · Tue Jan 30 2024 05:37:51 GMT+0800 (China Standard Time)

I think Bumblebee.Utils.Image.normalize_channels/2 just makes sure its a 3-band image so probably not required in Image since converting to the :srgb colorspace already makes that guarantee.

Similarly, Bumblebee.Utils.Image.normalize_size/2 isn't required as best I can tell.

Johanderson Mogollon · Answer 3 · Tue Jan 30 2024 20:31:41 GMT+0800 (China Standard Time)

Thanks for your suggestions

I found that I did speed up my workflow by doing resize of images before other transformations like normalization