Preprocessing and Model Expectations
videetparekh opened this issue · comments
Videet Rajeev Parekh commented
Hi Tianqi,
I'm using this tutorial as a guide to host one of my own MobileNetv2 models. I had a couple of questions regarding the preprocessing you are doing. I'm not used to JS so I'm very lost in this bit of code right here:
function preprocImage(imageData) {
const width = imageData.width;
const height = imageData.height;
const npixels = width * height;
const rgbaU8 = imageData.data;
// Drop alpha channel. Resnet does not need it.
const rgbU8 = new Uint8Array(npixels * 3);
console.log(rgbU8.length)
for (let i = 0; i < npixels; ++i) {
rgbU8[i * 3] = rgbaU8[i * 4];
rgbU8[i * 3 + 1] = rgbaU8[i * 4 + 1];
rgbU8[i * 3 + 2] = rgbaU8[i * 4 + 2];
}
// Cast to float and normalize.
const rgbF32 = new Float32Array(npixels * 3);
for (let i = 0; i < npixels; ++i) {
rgbF32[i * 3] = (rgbU8[i * 3] - 123.0) / 58.395;
rgbF32[i * 3 + 1] = (rgbU8[i * 3 + 1] - 117.0) / 57.12;
rgbF32[i * 3 + 2] = (rgbU8[i * 3 + 2] - 104.0) / 57.375;
}
// Transpose. Resnet expects 3 greyscale images.
const data = new Float32Array(npixels * 3);
for (let i = 0; i < npixels; ++i) {
data[i] = rgbF32[i * 3];
data[npixels + i] = rgbF32[i * 3 + 1];
data[npixels * 2 + i] = rgbF32[i * 3 + 2];
}
return data;
}
My input to this function is
ImageData {
data: Uint8ClampedArray(200704) [
14, 12, 26, 255, 13, 11, 25, 255, 11, 9, 22, 255,
11, 9, 20, 255, 11, 11, 21, 255, 11, 14, 21, 255,
11, 15, 18, 255, 10, 16, 16, 255, 12, 21, 18, 255,
18, 29, 23, 255, 23, 36, 26, 255, 24, 38, 25, 255,
17, 33, 20, 255, 15, 32, 16, 255, 17, 36, 17, 255,
21, 40, 21, 255, 17, 27, 18, 255, 23, 33, 25, 255,
25, 31, 27, 255, 22, 28, 26, 255, 24, 28, 29, 255,
29, 30, 34, 255, 43, 42, 47, 255, 64, 63, 69, 255,
70, 69, 74, 255,
... 200604 more items
]
}
I require an array of shape (1,3,224,224), which I believe is the traditional MNet input.
Questions I had:
- I don't understand what the output of this should be. I ran it myself and I get a Float32 Array of shape (1,150528). Is this how the MXNet Resnet/MobileNetv1 expects the input to be? Would it be possible to share a quick overview of what exactly you do here so I can manipulate it appropriately for my model?
- Is there a way to generate this array-like data without using a Canvas to draw it (Parse URI directly)
- Can TVM web dist be minified to make it more storage efficient?
- Is there a better way to reach you? I'm doing a ton of work trying to understand this side of TVM and it would help to be able to reach out to you to learn more.