sayakpaul / maxim-tf

The original MAXIM model can accept images of any resolution even though it was trained on 256x256x3 images.

But this doesn't constrain the MAXIM model to accept only 256x256x3 images. As long as the input image's spatial resolutions are divisible by 64, it's all good.

This is how the authors do it:

In our case, the model is built with layers.Input((256, 256, 3)):

maxim-tf/create_maxim_model.py

Line 29 in 12df753

inputs = keras.Input((input_resolution, input_resolution, 3))

If we use (None, None, 3), it throws:

Traceback (most recent call last):
  File "convert_to_tf.py", line 234, in <module>
    main(args)
  File "convert_to_tf.py", line 192, in main
    _, tf_model = port_jax_params(configs, args.ckpt_path)
  File "convert_to_tf.py", line 140, in port_jax_params
    tf_model = Model(**configs)
  File "/Users/sayakpaul/Downloads/maxim-tf/create_maxim_model.py", line 31, in Model
    outputs = maxim_model(inputs)
  File "/Users/sayakpaul/Downloads/maxim-tf/maxim/maxim.py", line 99, in apply
    height=h // (2 ** i),
TypeError: unsupported operand type(s) for //: 'NoneType' and 'int'

From the logs, it might seem obvious that we cannot build the Keras model with (None, None, 3) since there are calculations inside the model that require us to specify the spatial dimensions.

Do you know of any way to mitigate this problem or any other approach?

@gustheman

One solution might be to initialize the model every time it receives a new input with the spatial resolutions of the input and then load the weights and then run inference. But it's extremely inefficient.

I have added extensive comments in run_eval.py script to show how to do this.

I've just tried the create_maxim_model on a new environment and I didn't get this error
can you give me some eval examples for me to test further?

Did you try changing the resolution accepted by keras.Input to (None, None, 3)?

This line of code:

maxim-tf/create_maxim_model.py

Line 29 in 12df753

inputs = keras.Input((input_resolution, input_resolution, 3))

yes, it works
'''
m3 = Model(variant='M-2')
'''

but when I define an input_resolution=512
Traceback (most recent call last):
File "", line 1, in
File "/home/jupyter/maxim-tf/create_maxim_model.py", line 33, in Model
inputs = keras.Input((*input_resolution, 3))
TypeError: 'int' object is not iterable

maybe I'm doing something wrong?

I'll try more tomorrow, I'll ping you when I start

Sure. Let me know what you encounter. Maybe attach a Jupyter Notebook?

Hacked around this by introducing a dynamic_resize flag to run_eval.py.

Is this solution ideal? What would it require to natively support any sized image, perhaps with an independent X & Y resolution that is a multiple of 64? Do we need to retrain and re-export the model with (None, None, 3)?

I'm keen to help make this work in TFJS, as long as it works on arbitrary sized images without a big performance or quality hit. I've got a 4090 that I can dedicate to re-training, if needed, and I'm reasonably competent with TF/TFJS for inference.

From the logs, it might seem obvious that we cannot build the Keras model with (None, None, 3) since there are calculations inside the model that require us to specify the spatial dimensions.

I've managed to adjust this sort of internal issue in the model before. I'll start poking around in the model code to see the resolution-dependent bits.

Changes are being done here: #24

Building the model with `(None, None, 3)`