TRIQ failure on images of particular size range

Question

TRIQ failure on images of particular size range

jake-foxy opened this issue 2 years ago · comments

Hello,

Thank you for the great implementation of TRIQ.

I am able to run TRIQ successfully on most images, however it seems a particular range of resolutions causes failure.

First, I load in TRIQ model,

args = {}

args['n_quality_levels'] = 5
args['backbone'] = 'resnet50'
args['weights'] = 'path/TRIQ.h5'

model = create_triq_model(n_quality_levels=args['n_quality_levels'],
                          backbone=args['backbone'])

model.load_weights(args['weights'])

An example image link is below,

https://hpmlawatl.com/wp-content/uploads/2013/07/640x4802.gif

test_image = "/path/640x4802.gif"
image = Image.open(test_image).convert('RGB')
image = np.asarray(image, dtype=np.float32)
image = image[:,:,:3]
image /= 127.5
image -= 1.
prediction = model.predict(np.expand_dims(image, axis=0))

This shows an error,

InvalidArgumentError:  required broadcastable shapes
	 [[node model/tri_q_image_quality_transformer/add_1
 (defined at /home/ubuntu/production/triq/src/models/transformer_iqa.py:197)
]] [Op:__inference_predict_function_10094]

However, I can then resize the same image, to sizes both LARGER OR SMALLER, and the image will run successfully. As an example, this image can be set to either 512 X 384 OR 1024 X 768 and TRIQ will run fine.

test_image = "/path/640x4802.gif"
image = Image.open(test_image).convert('RGB')
img_sizes = image.size
print("Original Image size is, " + str(img_sizes[0])+  " " + str(img_sizes[1]))

size_cutoff = 1024 # This sets to 1024 X 768
size_cutoff = 512 # This sets to 512 X 384

if img_sizes[0] != size_cutoff and img_sizes[1] != size_cutoff:
    max_size = max(img_sizes)
    scale_factor = size_cutoff / max_size
    x_dim = round(img_sizes[0]*scale_factor)
    y_dim = round(img_sizes[1]*scale_factor)
    image = image.resize((x_dim,y_dim),Image.ANTIALIAS)

image = np.asarray(image, dtype=np.float32)
image = image[:,:,:3]
image /= 127.5
image -= 1.
prediction = model.predict(np.expand_dims(image, axis=0))

In order to pin this down I did a bit of empirical testing, and:

Values of size_cutoff = 513 will fail, while size_cutoff = 512 is okay.

Similarly, size_cutoff = 1057 will fail while size_cutoff = 1056 is okay.

I understand if an image is too small or large the TRIQ will fail. What I am not understanding is why images of a particular size (640X480) will fail, while the same image resized to be smaller (512, 384) or larger (1024, 768) will run successfully.

Any insight you have would be helpful.

Junyong You · Answer 1 · Wed Apr 27 2022 23:52:23 GMT+0800 (China Standard Time)

Hi, thanks for finding this bug. There is a potential bug in line 180 in transformer_iqa.py. I don't have time to do a full test now, so I won't make any changes. But if you rotate your image to 480x640, the model will work. Or you can change line 180 to:

if tf.shape(x)[1] >= 16 or tf.shape(x)[2] >= 18:

It will also work. So this is not because of this particular size, it's because the spatial pooling is not performed which will cause a problem.