hollance / CoreMLHelpers

Types and functions that make it a little easier to work with Core ML in Swift.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UIImage to MLMultiArray ?

panky8070 opened this issue · comments

Dear Matthijs,
Can you provide a function which can convert an UIImage to MLMultiArray [W, H, C] or [C, W, H]? Where, W = width, H = height and C= channels.
It will be a big help.

Regards,
Pankaj

Possibly, but my time is limited. What is the use case for this?

Hi Matthijs,
Thanks for the replay, I understand :).
Many of the CNN models from keras models (.h5) after conversion to .mlmodel do not take CVPixelBuffer as input instead they take MLMultiArray as an input even after passing the flag - image_input_names = 'data' in the conversion script. Hence, it is necessary to to convert input image to MLMultiArray(CxWxH) format. I have found this possible solution from stackoverflow-
https://stackoverflow.com/questions/44529869/converting-uiimage-to-mlmultiarray-for-keras-model .
But it didn't work. Hence, I was thinking as you have an expertise in iOS and CoreML, maybe you could help us with this conversion. There are some errors in the given solution. If you get some time, could you have a look ?

I think the smart thing to do is to figure out why your model doesn't take a CVPixelBuffer, since that will be much more efficient than converting a UIImage to an MLMultiArray. 😉

Notice that image_input_names='data' only works if your input is really called "data". If you did not rename your input, then it's probably called "input_1" or something.

Hi,
I have tried renaming the input as per my model specification, and I got the same problem. Anyways, I am facing this problem with Pretrained Inception V3 from keras.applications and I think it is because the base model input shape(None, None, None, 3), maybe this is causing the problem. I tried to change the input shape but some other errors were coming while retraining.
Anyways, thanks for the suggestion. I will try changing the architecture and see if that resolves it.
Thank you :)

commented

I run into the same issue.

I've found that you have to put the input_names field first, if you left that empty, image_input_names will not take effect

If you don't have time, can you at least told me the direction to implement this myself. I need to try running a custom model that take MLMultiArray as input and the input need to be modified from the Camera. Would love to help code up the function if only I know how.
Edit:

public func resize(to newSize: CGSize) -> UIImage {
                UIGraphicsBeginImageContextWithOptions(CGSize(width: newSize.width, height: newSize.height), true, 1.0)
        self.draw(in: CGRect(x: 0, y: 0, width: newSize.width, height: newSize.height))
        let resizedImage = UIGraphicsGetImageFromCurrentImageContext()!
        UIGraphicsEndImageContext()
        
        return resizedImage
} 
public func pixelData() -> [UInt8]? {
        let dataSize = size.width * size.height * 4
        var pixelData = [UInt8](repeating: 0, count: Int(dataSize))
        let colorSpace = CGColorSpaceCreateDeviceRGB()
        let context = CGContext(data: &pixelData, width: Int(size.width), height: Int(size.height), bitsPerComponent: 8, bytesPerRow: 4 * Int(size.width), space: colorSpace, bitmapInfo: CGImageAlphaInfo.noneSkipLast.rawValue)
        
        guard let cgImage = self.cgImage else { return nil }
        context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
        
        return pixelData
}
func preprocess(image: UIImage) -> MLMultiArray? {
    let size = CGSize(width: 224, height: 224)
    
    
    guard let pixels = image.resize(to: size).pixelData()?.map({ (Double($0) / 255.0 - 0.5) * 2 }) else {
        return nil
    }

    guard let array = try? MLMultiArray(shape: [3, 224, 224], dataType: .double) else {
        return nil
    }
    
    let r = pixels.enumerated().filter { $0.offset % 4 == 0 }.map { $0.element }
    let g = pixels.enumerated().filter { $0.offset % 4 == 1 }.map { $0.element }
    let b = pixels.enumerated().filter { $0.offset % 4 == 2 }.map { $0.element }

    let combination = r + g + b
    for (index, element) in combination.enumerated() {
        array[index] = NSNumber(value: element)
    }

    return array
}

I use theses function from FOOD101 project, it works but is super slow. Even they themselves move to use CVPixelBuffer as input now.
I'm still interested to know if there is any faster way to convert CVPixelBuffer or UIImage to MLMultiArray.

There may be vImage functions in the Accelerate framework that can help with this.

You can of course hire me to write this for you. Otherwise I don't think I will ever get around to doing this unless I happen to need this code myself.

I've also run into this need. My scenario is unique in that my input shape takes in an arbitrary width and height (Input Shape (3,)), which it seems coremltools doesn't support. I tried using image_input_names to take in a CVPixelBuffer instead of MLMultiArray but ran into:

# ValueError: not enough values to unpack (expected 3, got 1)
# because of the following line
channels, height, width = array_shape

https://github.com/apple/coremltools/blob/master/coremltools/models/neural_network.py?utf8=%E2%9C%93#L2542-L2558

Input Shape: (3,) # For example, can take an MLMultiArray of (3,1080,720). But cannot unpack in the following line

channels, height, width = array_shape

If you look at the conversion code, it's hardcoded to expect a width and height.


What are my options?

  1. Manually convert UIImage to MlMultiArray
  2. Hardcode my models width and height input shape and retrain
  3. ?

Any feedback would be appreciated.

For anyone else who ran into this, coremltools 2.0b will have some flexible shape utils to assist with this scenario.

        from coremltools.models.neural_network import flexible_shape_utils
        flexible_shape_utils.add_enumerated_multiarray_shapes
        flexible_shape_utils.add_enumerated_image_sizes
        flexible_shape_utils.update_image_size_range
        flexible_shape_utils.update_multiarray_shape_range

However, with it yet to be released on github, I opted to have a fixed sized input for now as described in this post: https://stackoverflow.com/a/46653912/639773

@Nick31419 What exactly do you need this for?

My model accepts only MLMultiArray as input, i need to pass CVPixelbuffer as input though.

@Nick31419 Is it not easier to change the mlmodel so that it expects an image instead of MLMultiArray? Or is the input to your model not really an image?

@hollance I have a tflite file of my custom model which I have converted to .mlmodel using TfCoreml converter, but it doesnt accept CVPixel buffer as input. How do i move forward now?
Inorder to change the model to accept image , do i have to train the model again?

@Nick31419 You don't need to train the model again, but you do need to tell tfcoreml that the model wants images as input and then convert to mlmodel again. I don't exactly know what the parameter is for that but it does have one (probably something like input_image_names).

@hollance Thanks Hollace. That Helped.

commented

@hollance I have a set of vimage_buffer's that I created using vImageMultiDimensionalInterpolatedLookupTable_PlanarF that I would like to pass as input to the network. The network takes [6 x 224x224] as an MLMultiArray input (6 images). I can use the accelerate functions to convert vImage_buffer to a cvPixelBuffer but I still am not sure how to convert 6 images into [6x224x224] MLMultiArray input.

My idea was to convert a [224x224] image buffer into a an [1x224x224]MLMultiArray and memcopy them into a [6x224x224] MLMultiarray. But I am not sure how to convert vImage_buffer to an MLMultiArray.

Any pointers to solve this would be highly appreciated.

@pavan4 I don't have time to look into this right now, but you need to make sure the MLMultiArray's datatype matches the datatype of the pixels in your image buffer. So for example, if the MLMultiArray expects doubles, you need to give it doubles. Since you're making a PlanarF image, I guess you need to set up the Core ML model so that the MLMultiArray accepts floats, not doubles.

Looks like I found a little bug which may generate input type as MLMultiArray instead CVPixelBuffer.
There is sample code for convert TensorFlow PB model to CoreML:

import tfcoreml
import coremltools
from coremltools.proto import NeuralNetwork_pb2

input_tensor_names = 'input_images:0'
output_tensor_names = ['out_features:0']
in_tf_model_path = '/data/models/pnasnet5_large_224_features_optimized.pb'
out_coreml_model_path = '/data/models/pnasnet5.mlmodel'
model_description = "PNASNet5 Large 224px"

# Supply a dictionary of input tensors' name and shape (with batch axis)
input_tensor_shapes = {input_tensor_names: [1, 224, 224, 3]} # batch size is 1

# Call the converter.
coreml_model = tfcoreml.convert (
tf_model_path=in_tf_model_path,
mlmodel_path=out_coreml_model_path,
output_feature_names=output_tensor_names,
image_input_names = input_tensor_names,
input_name_shape_dict=input_tensor_shapes
)

This sample generate input type as CVPixelBuffer:

Core ML input(s): 
 [name: "input_images__0"
type {
  imageType {
    width: 224
    height: 224
    colorSpace: RGB
  }
}]

But if add any scope name to input tensor name:
input_tensor_names = 'panasnet_or_any/input_images:0'
it works too but input type become MLMultiArray:

Core ML input(s): 
 [name: "input_images__0"
type {
  multiArrayType {
    shape: 3
    shape: 224
    shape: 224
    dataType: DOUBLE
  }
}]

Thus if you catch for input layer type MLMultiArray instead CVPixelBuffer first at all check whether input tenor name is correct.

I apologize for reviving an old thread, but for me using a MultiArray is the only way that I am able to use flexible input shapes with models converted from PyTorch. The flexible shape bugs are currently preventing me from deploying several different types of models, and I have been trying every possible workaround for weeks now.

There are a number of bug reports on the subject, these two are my posts Unified convertor and one using the ONNX converter.

But it's way too slow to convert to MultiArray without acceleration. Has anyone come up with a solution? It appears to be the only viable workaround to what seems like glaring bugs, and I haven't seen much of any response to the related issues.

I could write a Metal kernel to do it, is there an efficient way of transforming a MTLBuffer into a MultiArray?

@3DTOPO I think the main issue with doing the conversion is that you can't simply do a memcpy because the images may have 4 channels instead of 3, and the number of bytes per row may be larger than the stride used by the multi-array. (And of course you'll have to do your own preprocessing / normalization on the pixel data.)

Using a Metal kernel would work fine but you do have the overhead of launching just the one kernel and waiting for the results. This might undo any benefits you'd get from doing it in Metal in the first place. But I'd definitely give it a try since it shouldn't take too long to implement.

Do you mean that images can have 4-channels (RGBA) or like CMYK? For my purposes I could ignore the alpha channel. If 4 channels (RGBA) are needed for a memcpy, an alpha channel could be set pretty quickly using Core Image. Any hints how I would go about trying a memcpy?

Since it takes 16 seconds to convert a 1536x1536 to a multi-array using the old function from Food 101, seems like anything would have to be better. Any suggestions on how you would populate the MultiArray from Metal?

As far as I know, for the images you pass into Core ML the alpha channel is ignored.

If you're passing in multi-arrays, you have to remove the alpha channel yourself (i.e. if the model is trained to accept 3 channels, you can't give it 4 channels).

If you grab the pixels from a UIImage or CVPixelBuffer (or whatever) it will usually have the alpha channel inside it. So you cannot simply memcpy to an MLMultiArray because that would copy 4 channels per pixel instead of 3.

(The problem isn't that the image needs to have an alpha channel, but that the MLMultiArray cannot have one.)

If depends a bit on what format your pixels are in, but you can do the following:

  • Use vImageConvert_ARGB8888toPlanarF to convert the image into four separate buffers, one for each channel. This also turns the bytes into floats.
  • For each of the R, G, B buffers, loop through all the rows and memcpy each row into the right position in the MLMultiArray. You need to loop through the rows because the number of bytes per row may be larger than the row stride in the multi-array (there can be padding to make the number of bytes a multiple of 32 or whatever).
  • Add layers to the Core ML model to do pixel normalization. Note: depending on what sort of preprocessing you need, vImageConvert_ARGB8888toPlanarF can already do some of this.

Thanks for your informative reply!

Hmm, so it sounds like the easiest solution is to make my models accept 4 channels. I guess the downside would be 25% performance overhead from inferencing a 4th channel? Or could I ignore the 4th?

Thanks for the Metal strategy! Sounds straight forward enough. I'll give it a shot if just going with 4 channels doesn't make more sense.

Even if you changed you model to accept 4 channels, you still can't simply copy the pixel data. It also needs to be converted from bytes to floats. I think the vImageConvert_ARGB8888toPlanarF method is the way to go (pretty sure Core ML uses something like this internally when you pass a CVPixelBuffer as input).

I see. Well, I guess it's RIP for memcpy!

OK, guess I'll give the vImageConvert_ARGB8888toPlanarF a shot.

The current state of coremltools has me a bit jaded. I've seen just too many bugs in my struggles to get flexible shapes working. It's one of the single-most important Apple tools for me so I pray it gets better soon. Wish they let us know if they plan to fix these bugs or not.

I think I now have the data buffers populated from vImageConvert_ARGB8888toPlanarF. I need to figure out how to access the buffer one whole row at a time. I was thinking I could create a MTLTexture using single channel float32 pixel format, copy the channel buffer to the texture, then I can use a MTLRegion set to a single row with MTLTexture getBytes to get the data for one row. Does that make sense? Doesn't seem like much overhead to copy the buffer to a MTLTexture.

Once I have the row data, can you offer any suggestions how to copy the data to the MultiArray? I guess I will access the MultiArray pointer and increment it by the row size and then copy the row data?

Since you've already done vImageConvert_ARGB8888toPlanarF, you now have the 3 planes for R, G, B in three different vImage_Buffer objects (you can ignore the plane for A).

The MLMultiArray stores the data as (3, H, W), so three planes of H*W floats. That's why we used PlanarF.

You don't need to do anything with Metal, just take each vImage_Buffer and memcpy them row-by-row to the MLMultiArray. In pseudo code:

m = MLMultiArray(shape: [3, H, W])
p = m.dataPointer.assumingMemoryBoundTo(Float.self)
C_stride = m.strides[0]
H_stride = m.strides[1]
W_stride = m.strides[2]

r = vImage_Buffer containing the red pixels
r_in = r.data
r_out = p.advanced(by: C_stride*0)
for y in 0..<r.height {
  memcpy(r_out, r_in, W * MemoryLayout<Float>.stride)
  r_out = r_out.advanced(by: H_stride)
  r_in = r_in.advanced(by: r.rowBytes)
}

b = vImage_Buffer containing the blue pixels
b_in = b.data
b_out = p.advanced(by: C_stride*1)
// and so on...

Thanks a million for the pseudo code, I can't tell you how much I appreciate your help!

That is some damn fine pseudo!

It worked perfectly the first time I ran it! Well, 2nd time because CoreML insists my model has a batch dimension (I think its a PyTorch thing?), so once I changed the MLMultiArray to (shape: [1, 3, H, W]), and shifted the strides it worked perfectly!

And I haven't timed it yet, but it seems plenty fast to even use in a release! I'm so stoked to have flexible shape capability again, thank you, thank you!

w00t! 😄 It's unfortunate that you had to do this workaround, though... Obviously Core ML is lacking some regression tests.

I agree! I am optimistic they will get things straightened out in time, but now at least I don't have to wait for that.

@3DTOPO How well does vImageConvert_ARGB8888toPlanarF actually work for you? I'm trying to use it in a project right now and it only converts the first plane properly (my pixel buffer is BGRA format, which shouldn't matter, and only the blue color "survives"). vImageConvert_ARGB8888toPlanar8 works fine this way, but PlanarF does not. I'm pretty sure I've used it without problems in the past...

EDIT: never mind. It's due to misleading documentation for the Swift version of this function. The maxFloat and minFloat parameters need to be an array of 4 float values, not a pointer to a single float.

Yeah, I was having issues until I got min and max working properly too. Glad its working!

Also, I was having intermittent issues until I first set the alpha channel to 1 (using CI), then I use vImageConvert_AnyToAny to convert from the current format to ARGB8888, then convert to PlanarF from there.

commented

Hi @3DTOPO @hollance,

Sorry for reviving this thread once again! 😅 😅

I am also attempting get a work around for the flexible input image for Pytorch converted CoreML Model, but I am having issues converting image from RGBA to ARGB.

From my understanding, the original CGImage that I have is in RGBA format. According to what you are trying to achieve, I need to convert the CGImage from RGBA to ARGB, then use vImageConvert_ARGB8888toPlanarF to get the 3 channels needed. Is my understanding here correct?

I assumed my understanding above is correct, and tried to convert from RGBA to ARGB using vImageConvert_AnyToAny. I do so by specifying CGImageAlphaInfo.first in CGBitmapInfo. However, after converting, the resulting destination format is always in RGBA format despite explicit specification. Below is my source and destination format

        // Source Image Format
        guard var srcImageFormat = vImage_CGImageFormat(cgImage: cgImage) else {
            return
        }

        // Destination format — argb8888
        guard var destImageFormat = vImage_CGImageFormat(
            bitsPerComponent: 8,
            bitsPerPixel: 32,
            colorSpace: CGColorSpaceCreateDeviceRGB(),
            bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.first.rawValue), // <-- Specified that A comes first!!
            renderingIntent: .defaultIntent) else {
            return
        }

        // create converter
        let converterUnmanaged = vImageConverter_CreateWithCGImageFormat(
            &srcImageFormat,
            &destImageFormat,
            nil,
            vImage_Flags(kvImagePrintDiagnosticsToConsole),
            nil)

Am I missing something here? Can you elaborate how you convert the Image to ARGB8888 format?

To a lot of the vImage functions, it doesn't matter whether the order is ARGB or RBGA. I think vImageConvert_ARGB8888toPlanarF will work just fine with RGBA. But of course now your alpha channel will be in a different place.

My convert any to ARGB looks like yours except that I am passing nil for colorSpace.