lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`AttributeError: 'JpegImageFile' object has no attribute 'shape'`

ersamo opened this issue · comments

I was trying the code with the random image but I need to print the image I am passing nottouch.rand

my_image = load_img("127074.jpg")
v = Vit()
img = torch.randn(1, 3, 256, 256)
preds = v(img)
print(preds)
p=v(my_image)
print(p)

the code worked with using random but when i pass my_image i got
AttributeError: 'JpegImageFile' object has no attribute 'shape'

@ersamo - a common reason for this is if you load the image using PIL. (i.e. check your load_img function, I bet it's loading using PIL). It's then a PIL.Image which has no property shape.
You can force it to an np array with a quick conversion:
import numpy as np
np_image = np.array(my_image)
or
Alternatively you can use openCV to load the image as it uses np arrays natively:
import cv2
np_image = cv2.imread("127074.jpg")
At this point test and verify:
print(f"shape of image is {np_image.shape}")

*update - actually I know you have a PIL image as I just noticed your title references 'JpegImageFile', which is the PIL Image class for jpegs:
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1024x768 at 0x9E4B941C22E9>
<class 'PIL.JpegImagePlugin.JpegImageFile'>

Anyway, just push it to np array and you can then process it.
Hope that helps!

Thanks a lot the error disappeared but I got another

 B, C, H, W = x.shape
ValueError: not enough values to unpack (expected 4, got 3)

should i pass in the model constructor something ?

@ersamo - you would only have 3 variables from a shape call of an image that you loaded.
height, width, channel = myimage.shape

You have a B there (for batch I guess) but you don't have a batch, you just have an image. So the error is saying only 3 variables were returned but you are expecting 4.

Oh wait, this isn't your code - you are passing it into one of the models right?
So you need to make it a batch of 1:
Numpy way
my_image_batch = np.stack(fullsize).reshape(-1, h, w,c)
Torch way:
import torch
t = torch.tensor(np_image)
my_image_batch = t.unsqueeze(dim=0)

either way then you now have a batch of 1:
my_image_batch.shape
torch.Size([1, 4000, 3000, 3])
or
(1, 4000, 3000, 3)
depending if you did it with np or torch.
That said, your CHW order will not match that signature of BCHW, you currently have BHWC...but I don't know what code you are passing it to.
That code may already be reshaping it to be BCHW since people normally just pass in the untouched order from loading.

If you use torch.to_tensor() then it should force the right channel ordering.
But in case you need it - you can force the axis to match the CHW format with:
rolled_image = np_image.transpose(2, 0,1) # change from hwc to chw
And then do the np.stack as above to add the 1 for Batch to it.
or to move it to torch:
img = torch.from_numpy(pic.transpose((2, 0, 1))).contiguous()

appreciaiting your time but excuse me do you mean like this


v = Vit()

img="127074.jpg"

np_image = np.array(img)

t = torch.tensor(np_image)

my_image_batch = t.unsqueeze(dim=0)

preds = v(my_image_batch)

print(preds)

as i got TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Oh I see - I didn't read all the code showing you are passing it to Vit.
The code you are using is really just a dummy code to test that everything flows through the transformer. (i.e. LucidRains is just making a random tensor to push data through and make sure nothing breaks and a reasonable result).
For a real image, you'd want to do the proper transforms to set it to ImageNet mean and std dev (normalize it), and turn it into a pytorch tensor and batch so it can run on gpu if you want.
then pass it to the model.

Thanks . this is what I am asking for :-) .. how can I start to pass a real image not random

ok we're out of sync as you posted the str error right as I posed mine :)
In the code you just posted, you are not loading the image...you are literally passing in the string "127074.jpg"... and hence numpy is complaining it can't work well with str chars...you want to load the image, then manipulate it.
Anyway, now that I see what you are doing you need a proper processing setup if you want to use a real image...basically a quick setup for inference.
I need to go to dinner but if you want, I can post out a working code snippet for you tomorrow that will load the image, prep it into a proper torch tensor and pass it to the model and you can get your prediction.

ok, Thanks a lot and have a good dinner :) ... I will wait you but I will try also and if I did it I will tell you .. Thanks again

dinner was delayed...so here you go:

import torch
import torchvision
from PIL import Image
from torch import Tensor
import torchvision.transforms as T


inference_transform = T.Compose(
    [
        T.Resize(256),
        T.ToTensor(),
        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),  # Imagenet
    ]
)

pred_device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# load image, prep for inference and run it
orig_img = Image.open("myimage.jpg")

img_tensor = inference_transform(orig_img)

img_tensor = img_tensor.unsqueeze(0)

tensor_on_device = img_tensor.to(pred_device)

output = model(tensor_on_device)

I didn't test it but that gives you all the main steps to work with a real image.
Maybe I can setup a full load and run function that's just a callable addition to any code later since I've seen other people hit the same question of how do you prep an image for a trained model before.
(or @lucidrains can make a generic function usable for all the many awesome models here ;)
Anyway, the ToTensor function handles the ordering changes to make it to CHW (pytorch format) now so you won't need what I mentioned earlier.
And the unsqueeze adds the 1 dimension for batch.
That should get you up and running with real images!

Thank a lot for your effort i tried it now and got
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
the error as i think means that the model's weights are on the GPU, while the image is on the CPU

Hi @ersamo
Yes exactly. In the above, from the error...your image is on gpu, your model is on cpu.
To fix
model.to(pred_device)

Then both are on gpu and you should be cruising.