uzh-rpg / netvlad_tf_open

Tensorflow port of https://github.com/Relja/netvlad

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Output does not correspond

AlbertoJaenal opened this issue · comments

Hello. First, thanks for your contribution with this repo.

I downloaded it a couple days ago and I began to try. As you suggest, I ran the matlab scripts in order to get the checkpoint. I am using the vd16_pitts30k_conv5_3_vlad_preL2_intra_white model, so I did not need to make any changes in the scripts. I got my structed.mat and the followed the steps. So, I ran the
test_net_from_mat.py script, and the output became this:

Took 0.059705 seconds
Layer vgg16_netvlad_pca/:		Max error is 42.000004
F
======================================================================
FAIL: testNetFromMat (__main__.TestNetFromMat)
Need example_stats.mat in matlab folder, which can be generated
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_net_from_mat.py", line 50, in testNetFromMat
    self.assertLess(maxod, 0.018)
AssertionError: 42.000004 not less than 0.018

----------------------------------------------------------------------
Ran 1 test in 13.200s

FAILED (failures=1)

I also ran the test_nets.py and the output was this:

Took 0.054931 seconds
F
======================================================================
FAIL: testVgg16NetvladPca (__main__.TestNets)
Need example_stats.mat in matlab folder, which can be generated
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_nets.py", line 49, in testVgg16NetvladPca
    self.assertLess(np.linalg.norm(out_diff), 0.0053)
AssertionError: 0.040490005 not less than 0.0053

----------------------------------------------------------------------
Ran 1 test in 2.503s

FAILED (failures=1)

I have been comparing the descriptor of some images to the descriptor extracted by the matlab implementation and it is very different. Am I missing something?
I have also downloaded the checkpoint given, and the results are the same. It seems like somewhere the implementations are different. I have also tried to run the graph from the .meta file and also from your file. The results are always the same. It seems like there is something in which Tensorflow and Matconvnet differ.

I have Tensorflow 1.8.0

Thanks in advance

Dear Alberto,

thanks for raising this issue. Unfortunately, I will not be able to look into this right away. We had this code tested with Tensorflow 1.4.1 and 1.6.0, I hope it's not an issue with the Tensorflow version...

It's encouraging that the error happens so early in the network though. It looks like in the very first layer. I don't remember any more whether this is average image normalization or the first convolution. See https://github.com/uzh-rpg/netvlad_tf_open/blob/master/tests/test_net_from_mat.py#L42-L48

If you'd have some time, could you maybe increase the 18s to 28s or something in https://github.com/uzh-rpg/netvlad_tf_open/blob/master/tests/test_net_from_mat.py#L47 and repost the output? That should give a clue whether it's image normalization ( https://github.com/uzh-rpg/netvlad_tf_open/blob/master/python/netvlad_tf/net_from_mat.py#L37-L41 ) or convolution.

Thanks!

Dear Titus,

thanks for your reply. As you suggested, the problem does not come from the net implementation, but from the normalization operation. At this point, I found that the difference between MATLAB and Tensorflow came from how MATLAB and Python read images. I show the difference (a simple subtraction) in this image:
figure_3
For checking, I loaded in Python the MATLAB array corresponding to the image, and I inferred the result. The result was correct, so the network implementation is ok.

Again, thanks for your reply and for your contribution.
I therefore close this issue.

Annoying. Thanks for figuring this out!

Hi, @AlbertoJaenal I meet the same problem too. Do you have any advice to solve that, or just ignore it? Thanks a lot.

Hi @Mendel1, I just ignored it.
This is, the only thing you must keep in mind is that you cannot compare NetVLAD-TF representations with NetVLAD-MATLAB representiations, or at least you must control how they open the images (I just extracted with TF all the descriptor I worked with). Apart from that, the NetVLAD implementation offered here works perfectly.
Regards

when I run the code, meet the problem as this, do you have any advice ti slove that?

2019-04-06 11:42:06.680233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10415 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2019-04-06 11:42:06.782624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10414 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1)
2019-04-06 11:42:06.905462: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_tensor.cc:170 : Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /data/zhangqiong/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white
E

ERROR: testVgg16NetvladPca (main.TestNets)
Need example_stats.mat in matlab folder, which can be generated

Traceback (most recent call last):
File "tests/test_nets.py", line 31, in testVgg16NetvladPca
saver.restore(sess, nets.defaultCheckpoint())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1775, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /data/zhangqiong/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Hi @Mendel1, I just ignored it.
This is, the only thing you must keep in mind is that you cannot compare NetVLAD-TF representations with NetVLAD-MATLAB representiations, or at least you must control how they open the images (I just extracted with TF all the descriptor I worked with). Apart from that, the NetVLAD implementation offered here works perfectly.
Regards

Hi @AlbertoJaenal. I meet the same problem and I don't understand why you ignore it. I think if the image RGB values loaded in python are different from those loaded in matlab but network parameters are same, NetVLAD-TF representations are actually wrong, are not? The right way is to use correct python image loader?

Hi, @dontLoveBugs,

as I inderstand this issue, the only problem derived from the different conversion is that the performance of the net will drop when comparing images loaded from Python with images loaded from Matlab.

But as the problem only relies in little differences at pixel level when reading, the performance of the net won't suffer too much when comparing images read with the same method. After all, the net outputs a whole-image representation based on appearance, so slightly different inputs will have slightly different outputs, but the high level features will be practically the same, as the net parameters have millions of parameters that I suppose robust to this little "nosie".

I think this is a similar issue than transferring a model between different frameworks (TensorFlow, PyTorch...). The weights and the operations in the inference stage will be slightly different, but the performances of the different implementations are still similar.

I can assure you that I have used the descriptors of this implementation of NetVLAD with very satisfactory results (reading the images with Python methods). But if you are not totally sure of its performance, you can read the images Matlab-wise and then use this network.

Hope this will help you!

@dontLoveBugs so you consider cv2.imread to be "incorrect" 😅 ? On a serious note, it seems like the issue is that Matlab is using a newer version of libjpeg, hence the difference:
https://stackoverflow.com/questions/31607731/opencv-vs-matlab-different-values-on-pixels-with-imread
I don't know if pillow is using the newer version of libjpeg... @dontLoveBugs did you want to try that (I don't have time right now)? If that fixes the issue, I'd be very happy about a pull request 😁!

P.S.: It's interesting to think that switching the underlying libjpeg library results in an adversarial attack on NetVLAD!:

attack

when I run the code, meet the problem as this, do you have any advice ti slove that?

2019-04-06 11:42:06.680233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10415 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)

2019-04-06 11:42:06.782624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10414 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1)
2019-04-06 11:42:06.905462: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_tensor.cc:170 : Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /data/zhangqiong/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white
E

ERROR: testVgg16NetvladPca (main.TestNets)

Need example_stats.mat in matlab folder, which can be generated
Traceback (most recent call last):
File "tests/test_nets.py", line 31, in testVgg16NetvladPca
saver.restore(sess, nets.defaultCheckpoint())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1775, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /data/zhangqiong/checkpoints/vd16_pitts30k_conv5_3_vlad_preL2_intra_white
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

I met the same errors.Have you solved it?

How did you get structed.mat?
After I run net_class2struct.m, the error ‘’Index exceeds matrix dimensions’’ appears.

Looking forward to your reply
Thx