Model works on python but not on GO.

Question

Model works on python but not on GO.

shabashaash opened this issue 3 years ago · comments

shabashaash commented 3 years ago

Sorry again for my stickiness, but i would reaaaaaally appreciate the help.

I managed to trace the model (scripting just doesn`t work neither on python and on GO).

It works on python (works poorly but i trained it like 2 hours, so its okay):

But on GO by loading same traced file i get this:

Looks like some final layers just don`t work or somethin.

Here is my GO code for loading:

func main() {



	device := gotch.CudaIfAvailable()

	image, err := vision.Load("/path/to/sample.jpg")
	if err != nil {
		log.Fatal(err)
	}

	imageTs, err := vision.Resize(image, 512, 512)
	if err != nil {
		log.Fatal(err)
	}

	usimage := imageTs.MustUnsqueeze(0, true)

	Img := usimage.MustTotype(gotch.Float, true)

	Img = Img.MustTo(device, true)

	model, err := ts.ModuleLoad("/path/to/traced_model.pt")
	if err != nil {
		log.Fatal(err)
	}

	output := Img.ApplyCModule(model)

	imag := output.MustDetach(false)
	result := imag.MustTo(gotch.CPU, true)

	// result = result.MustUnsqueeze(int64(0), false)

	saveFile := fmt.Sprintf("/path/to/res_%v", "sample.jpg")
	err = vision.Save(result, saveFile)
	if err != nil {
		log.Fatal(err)
	}

	// image := output.MustUnsqueeze(float64(0), false)

	fmt.Printf("done")

	// fmt.Printf("%20.20f\n", output)

}

Python code to load traced model:

cuda = torch.cuda.is_available()
if cuda:
    print('Cuda is available!')
    cudnn.benchmark = True
netG = torch.jit.load('/path/to/traced_model.pt')
if cuda:
    netG = netG.cuda()
for param in netG.parameters():
    param.requires_grad = False
netG.eval()
imgs = Image.open('/path/to/sample.jpg').resize((512,512))
img = transforms.ToTensor()(imgs).unsqueeze_(0).cuda()
g_images = netG(img)
save_image(g_images, '/path/to/res_sample.jpg')

Python code to trace the model:

cuda = torch.cuda.is_available()
if cuda:
    print('Cuda is available!')
    cudnn.benchmark = True
netG = STRnet2(3)
netG.load_state_dict(torch.load('/path/to/trained_state_dict/STE_2.pth'))
if cuda:
    netG = netG.cuda()
for param in netG.parameters():
    param.requires_grad = False
netG.eval()



img = torch.rand(1,3,512,512).cuda()
traced_script_module = torch.jit.trace(netG, img)
traced_script_module.save("/path/to/traced_model.pt")

shabashaash · Answer 1 · Fri Jul 23 2021 06:09:31 GMT+0800 (China Standard Time)

@sugarme After a little research I think I have found the problem.
In my model im using spectral_norm (its torch.nn.utils.spectral_norm ) which i think is not implemented in your library. (Tensor values in golang and python are not the same. In python after the first layer with spectral_norm im getting tensors with values between -1 and 1, but in go im getting just values, without any regularization).

Could you please help with adding this normalization function to lib?

sugarme · Answer 2 · Fri Jul 23 2021 07:32:21 GMT+0800 (China Standard Time)

@shabashaash ,

Thank you for reporting the issue. And please verify that I get you right:

Your trained model runs okay in inferring mode but not working when you convert it to JIT and run infer with JIT model in Python?
JIT model loaded and and ran inferring in Go gave strange result (the images above)?
You think strange result is caused by missing spectral_norm in gotch?

shabashaash · Answer 3 · Fri Jul 23 2021 17:32:42 GMT+0800 (China Standard Time)

@shabashaash ,

Thank you for reporting the issue. And please verify that I get you right:

* Your trained model runs okay in inferring mode but not working when you convert it to JIT and run infer with JIT model in Python?

* JIT model loaded and and ran inferring in Go gave strange result (the images above)?

* You think strange result is caused by missing `spectral_norm` in gotch?

@sugarme

Its works fine on python in both cases (when converted to JIT and not).
Yes i`am loading JITed model in GO and getting this strange results.
Yes i think so because of tensor values which model returns after the first conv module (it has activation function and spectral_norm in it), the values returned from python (both JITed and not) lies in range from -1 to 1, but values which go returns doesnt have any restrictions.

I tried to remove activation from this module (just dont forward through it), python still return good values and go still return junk.

Here is python code of my module:

class ConvWithActivation(torch.nn.Module):
    """
    SN convolution for spetral normalization conv
    """
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
        super(ConvWithActivation, self).__init__()
        self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias).cuda()
        self.conv2d = torch.nn.utils.spectral_norm(self.conv2d).cuda()    #!!!!! prob?
        self.activation = torch.nn.LeakyReLU(0.2, inplace=True) #!!!!!!!!!!
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight) #!!!!!!!!!!!!!!
    def forward(self, input):
        x = self.conv2d(input)
        return self.activation(x)

Here is example of python values:

And in GO:

(Great screenshot i know, but im too lazy to write it to file i think you get the point.)

I kinda cant remove spectral norm layer to test what python will return because my model has trained weights already. Of course i can retrain it, but this layer its the only thing left (because like i said i tried to remove some other parts of Module and still python returned "normalized" values and GO not).

sugarme · Answer 4 · Fri Jul 23 2021 18:19:54 GMT+0800 (China Standard Time)

@shabashaash,

I guess your input image is in byte value (0 - 255). Should it be converted to float (0, 1) by dividing to 255.0 before being fed to the model?
BTW, for JIT model in inferring mode, gotch does nothing with model architecture, everything happening in Pytorch backend which is Libtorch C++ unless composing your model with gotch, and if so, you can load JIT model and finetuning with gotch.

Can you double check and try something like:

input := imgTs.MustDiv1(ts.FloatScalar(255.0), true).MustTotype(gotch.Float, true)
output := input.ApplyCModule(model)

To see how you go?

shabashaash · Answer 5 · Fri Jul 23 2021 19:07:40 GMT+0800 (China Standard Time)

@shabashaash,

I guess your input image is in byte value (0 - 255). Should it be converted to float (0, 1) by dividing to 255.0 before being fed to the model?
BTW, for JIT model in inferring mode, gotch does nothing with model architecture, everything happening in Pytorch backend which is Libtorch C++ unless composing your model with gotch, and if so, you can load JIT model and finetuning with gotch.

Can you double check and try something like:
input := imgTs.MustDiv1(ts.FloatScalar(255.0), true).MustTotype(gotch.Float, true)
output := input.ApplyCModule(model)
To see how you go?

Ok yeah i think you right. Gonna try it out.

Ok now im getting results in float betw -1 and 1 which understandable. But how to convert this output to bytes again so i can save the image properly?

UPD:
@sugarme
Here is my code (i added the Mul at the end to convert values):

func main() {

	device := gotch.CudaIfAvailable()

	image, err := vision.Load("/home/user/Desktop/erasenet/EraseNetGo/sample.jpg")
	if err != nil {
		log.Fatal(err)
	}

	imageTs, err := vision.Resize(image, 512, 512)
	if err != nil {
		log.Fatal(err)
	}
	// Img := usimage.MustTotype(gotch.Float, false)

	Img := imageTs.MustTo(device, false)

	input := Img.MustDiv1(ts.FloatScalar(255.0), true).MustTotype(gotch.Float, true)

	usimage := input.MustUnsqueeze(0, false)


	model, err := ts.ModuleLoad("/home/user/Desktop/erasenet/EraseNetGo/traced_model.pt")
	if err != nil {
		log.Fatal(err)
	}

	output := usimage.ApplyCModule(model).MustMul1(ts.FloatScalar(255.0), false)

	// fmt.Printf("%20.20f\n", output)

	saveFile := fmt.Sprintf("/home/user/Desktop/erasenet/EraseNetGo/res_%v", "sample.jpg")
	// ts.SaveHwc(output, saveFile)
	err = vision.Save(output, saveFile)
	if err != nil {
		log.Fatal(err)
	}

And im getting this:

The most funny part is that the model actually works (you can see some blur at some places) but the colors are not right).

sugarme · Answer 6 · Fri Jul 23 2021 19:37:03 GMT+0800 (China Standard Time)

@shabashaash,

I don't know what exactly your model is, but the output is logits and it should be either softmax or sigmoid to be (0, 1). Try something like:

logits.MustSoftmax(...).MustMul1(ts.FloatScalar(255.0), false/true)

shabashaash · Answer 7 · Fri Jul 23 2021 19:51:29 GMT+0800 (China Standard Time)

@shabashaash,

I don't know what exactly your model is, but the output is logits and it should be either softmax or sigmoid to be (0, 1). Try something like:
logits.MustSoftmax(...).MustMul1(ts.FloatScalar(255.0), false/true)

@sugarme

My model is a GAN for removing text from image. I tried sigmoid and get this:

Image kinda pale but its normal now. I gonna check my python api and maybe find a way to format image properly.

shabashaash · Answer 8 · Fri Jul 23 2021 20:57:14 GMT+0800 (China Standard Time)

@sugarme
I finally made it! (I just rewrite python saving_image function in GO with your functions).

Dude big thanks like VERY. Your lib is great and the only one which i managed to install and comprehend.) Thx again for your help, appreciate!