tqchen / tvm-webgpu-example

Compiling machine learning to wasm and WebGPU

Home Page:https://tqchen.com/tvm-webgpu-example/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preprocessing and Model Expectations

videetparekh opened this issue · comments

Hi Tianqi,

I'm using this tutorial as a guide to host one of my own MobileNetv2 models. I had a couple of questions regarding the preprocessing you are doing. I'm not used to JS so I'm very lost in this bit of code right here:

function preprocImage(imageData) {
    const width = imageData.width;
    const height = imageData.height;
    const npixels = width * height;

    const rgbaU8 = imageData.data;

    // Drop alpha channel. Resnet does not need it.
    const rgbU8 = new Uint8Array(npixels * 3);
    console.log(rgbU8.length)
    for (let i = 0; i < npixels; ++i) {
        rgbU8[i * 3] = rgbaU8[i * 4];
        rgbU8[i * 3 + 1] = rgbaU8[i * 4 + 1];
        rgbU8[i * 3 + 2] = rgbaU8[i * 4 + 2];
    }

    // Cast to float and normalize.
    const rgbF32 = new Float32Array(npixels * 3);
    for (let i = 0; i < npixels; ++i) {
        rgbF32[i * 3] = (rgbU8[i * 3] - 123.0) / 58.395;
        rgbF32[i * 3 + 1] = (rgbU8[i * 3 + 1] - 117.0) / 57.12;
        rgbF32[i * 3 + 2] = (rgbU8[i * 3 + 2] - 104.0) / 57.375;
    }

    // Transpose. Resnet expects 3 greyscale images.
    const data = new Float32Array(npixels * 3);
    for (let i = 0; i < npixels; ++i) {
        data[i] = rgbF32[i * 3];
        data[npixels + i] = rgbF32[i * 3 + 1];
        data[npixels * 2 + i] = rgbF32[i * 3 + 2];
    }
    return data;
}

My input to this function is

ImageData {
  data: Uint8ClampedArray(200704) [
    14, 12, 26, 255, 13, 11, 25, 255, 11,  9, 22, 255,
    11,  9, 20, 255, 11, 11, 21, 255, 11, 14, 21, 255,
    11, 15, 18, 255, 10, 16, 16, 255, 12, 21, 18, 255,
    18, 29, 23, 255, 23, 36, 26, 255, 24, 38, 25, 255,
    17, 33, 20, 255, 15, 32, 16, 255, 17, 36, 17, 255,
    21, 40, 21, 255, 17, 27, 18, 255, 23, 33, 25, 255,
    25, 31, 27, 255, 22, 28, 26, 255, 24, 28, 29, 255,
    29, 30, 34, 255, 43, 42, 47, 255, 64, 63, 69, 255,
    70, 69, 74, 255,
    ... 200604 more items
  ]
}

I require an array of shape (1,3,224,224), which I believe is the traditional MNet input.

Questions I had:

  1. I don't understand what the output of this should be. I ran it myself and I get a Float32 Array of shape (1,150528). Is this how the MXNet Resnet/MobileNetv1 expects the input to be? Would it be possible to share a quick overview of what exactly you do here so I can manipulate it appropriately for my model?
  2. Is there a way to generate this array-like data without using a Canvas to draw it (Parse URI directly)
  3. Can TVM web dist be minified to make it more storage efficient?
  4. Is there a better way to reach you? I'm doing a ton of work trying to understand this side of TVM and it would help to be able to reach out to you to learn more.