A problem with the code running on the PGM dataset

Question

A problem with the code running on the PGM dataset

wangm-buaa opened this issue 4 years ago · comments

The result of using this code on the RAVEN dataset is roughly the same as the data shown in the paper, but the performance of this code on the PGM data set is much different from the paper (we only got about 30% accuracy)
We noticed that the image data(16x160x160) of the RAVEN dataset is different from the image data (160x160x16) of the PGM dataset. Is there any other details to be considered when running on the PGM dataset?

Code provided：

    data = np.load(data_path)
    image = data["image"].reshape(16, 160, 160)
    target = data["target"]
    
    if self.img_size != 160:
        resize_image = []
        for idx in range(16):
            resize_image.append(misc.imresize(image[idx, :, :], (self.img_size, self.img_size)))
        image = np.stack(resize_image)
    image = torch.tensor(image, dtype=torch.float)
    target = torch.tensor(target, dtype=torch.long)

After modification：

    data = np.load(data_path)
    image = data["image"].reshape(160, 160,16)
    target = data["target"]
    
    if self.img_size != 160:
        resize_image = []
        for idx in range(16):
            resize_image.append(misc.imresize(image[:, :, idx], (self.img_size, self.img_size)))
        image = np.stack(resize_image)
    image = torch.tensor(image, dtype=torch.float)
    target = torch.tensor(target, dtype=torch.long)

Chi Zhang · Answer 1 · Sun Apr 12 2020 23:03:03 GMT+0800 (China Standard Time)

Actually you do not need to make the modification. Please check https://github.com/deepmind/abstract-reasoning-matrices/blob/master/README.md It says to reshape it as (16, 160, 160). I'm not sure how many epochs you have run for training on PGM. As the dataset is gigantic, it takes LONG to get the performance. And please note the performance is reported for the neutral split.

…

On Sun, Apr 12, 2020, 03:25 wm17373510 ***@***.***> wrote: The result of using this code on the RAVEN dataset is roughly the same as the data shown in the paper, but the performance of this code on the PGM data set is much different from the paper (we only got about 30% accuracy) We noticed that the image data(16x160x160) of the RAVEN dataset is different from the image data (160x160x16) of the PGM dataset. Is there any other details to be considered when running on the PGM dataset? Code provided： data = np.load(data_path) image = data["image"].reshape(16, 160, 160) target = data["target"] if self.img_size != 160: resize_image = [] for idx in range(16): resize_image.append(misc.imresize(image[idx, :, :], (self.img_size, self.img_size))) image = np.stack(resize_image) image = torch.tensor(image, dtype=torch.float) target = torch.tensor(target, dtype=torch.long) After modification： data = np.load(data_path) image = data["image"].reshape(160, 160,16) target = data["target"] if self.img_size != 160: resize_image = [] for idx in range(16): resize_image.append(misc.imresize(image[:, :, idx], (self.img_size, self.img_size))) image = np.stack(resize_image) image = torch.tensor(image, dtype=torch.float) target = torch.tensor(target, dtype=torch.long) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWYBKWNZHHNIQPKBPAJEN3RMGJKTANCNFSM4MGLQTPQ> .

wangm · Answer 2 · Mon Apr 13 2020 07:32:37 GMT+0800 (China Standard Time)

We noticed the performance is reported for the neutral split, and the code has already run 200 epochs on this part. How many epochs may be needed to achieve the effect of the display?

Chi Zhang · Answer 3 · Mon Apr 13 2020 08:03:29 GMT+0800 (China Standard Time)

We did not run 200 epochs to get the performance. Usually after 30 epochs, it plateaus. We ran the code on a 4-GPU server. I'm not sure if batch norm would affect it, as we did not use synchronized batch norm. Em, other factors to consider, maybe try tuning the learning rate.

wangm · Answer 4 · Mon Apr 13 2020 18:41:52 GMT+0800 (China Standard Time)

If your team uses parameters that are different from the default values in the code, could you please tell us your settings (such as batch-size and learning rate)?
We also have a 4-GPU server.With great interest to your research, we are looking forward to a local reproduction on PGM dataset and promotion. Thank you very much for your patience and enthusiasm.

Chi Zhang · Answer 5 · Mon Apr 13 2020 22:55:29 GMT+0800 (China Standard Time)

I don't think the batch size matters a lot for performance in the end. Either 32 or 64 should work. For learning rate, maybe try 9.5e-5 first, if not, then try 6.5e-5. Please let me know if the setting works, so that I can change the default params.

…

On Mon, Apr 13, 2020, 03:42 wm17373510 ***@***.***> wrote: If your team uses parameters that are different from the default values in the code, could you please tell us your settings (such as batch-size and learning rate)? We also have a 4-GPU server.With great interest to your research, we are looking forward to a local reproduction on PGM dataset and promotion. Thank you very much for your patience and enthusiasm. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWYBKSBF4NX76KBXO5HSEDRMLT73ANCNFSM4MGLQTPQ> .

wangm · Answer 6 · Fri May 08 2020 09:42:56 GMT+0800 (China Standard Time)

We tried the different learning rates you provided. Each learning rate was trained with 200 epochs on the neutral part of PGM, and the best result was 51.19%(lr=1e-3). We plan to publish the paper according to this data if you have no other suggestions here.

Chi Zhang · Answer 7 · Fri May 08 2020 10:17:08 GMT+0800 (China Standard Time)

Thanks for the information. Yeah that's fine. It does take time to tune the model on PGM. Not that hard but may be harder than expected.

…

On Thu, May 7, 2020, 18:43 wm17373510 ***@***.***> wrote: We tried the different learning rates you provided. Each learning rate was trained with 200 epochs on the neutral part of PGM, and the best result was 51.19%(lr=1e-3). We plan to publish the paper according to this data if you have no other suggestions here. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWYBKQKMIVDMHTZUUSRXATRQNPS3ANCNFSM4MGLQTPQ> .