U2Net inconsistency

Question

U2Net inconsistency

simgt opened this issue a year ago · comments

Hi there!

Thanks @john-rocky for sharing your work, this repo is really amazing 🙂

I was trying to convert U2Net to CoreML before stumbling on your blog post. You are patching the model afterwards to add the * 255 conversion and set the output to a grayscale image, whereas in my case I add the scaling in PyTorch before the conversion and use this as output for ct.convert:

outputs=[
        ct.ImageType(
            name="maskOutput",
            color_layout=ct.colorlayout.GRAYSCALE,
        )
    ],

In both of our models I find some inconsistencies with what I get directly from PyTorch and the CoreML model. Here's what I use to forward the inputs:

model = U2NET(3, 1)
model.load_state_dict(torch.load("saved_models/u2net/u2net.pth", map_location="cpu"))
model.eval()

input_image = Image.open("test_data/cat_dog.jpg")

preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize(320),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)

with torch.no_grad():
    torch_output = model(input_batch)[0]
    print("torch output shape:", torch_output.shape)

torch_mask = transforms.functional.to_pil_image(torch_output[0, :, :, :])
torch_mask.save("torch_mask.png")
print("torch max value =", np.array(torch_mask).max())

mlmodel = ct.models.MLModel("u2net-rocky.mlmodel")
ml_mask = mlmodel.predict({"input" : input_image.resize((320, 320))})["out_p0"]
ml_mask.save("coreml_mask.png")

Did you observe similar discrepancies? If not, do you have any idea what I am doing wrong? 👀

john-rocky / CoreML-Models

U2Net inconsistency

CoreML output (yours)

PyTorch output (official repo)