About glove.42B.300d.dat

Question

About glove.42B.300d.dat

hannajiang opened this issue 3 years ago · comments

Thanks for your excellent job, but I have some question when I run your code. I want to know how to prepare glove.42B.300d.dat, glove.42B.300d.50_words.pkl and glove.42B.300d.50_idx.pkl?

Ziwei XU · Answer 1 · Wed Aug 18 2021 14:56:13 GMT+0800 (China Standard Time)

Thank you for your interest!

Actually this is a leftover code and we will clean it soon. Meanwhile, you can generate the data files following these steps:

Download glove word embedding 42B 300d here, unzip it and place glove.42B.300d.txt under data/glove/.
Open your terminal, change directory to model/datasets, run python glove.py. Note that you need to install bcolz.
The script will call load_glove_to_binary() to generate the required data files in data/glove/.

Han Jiang · Answer 2 · Wed Aug 18 2021 19:05:56 GMT+0800 (China Standard Time)

Thanks for your reply, but I have another question that no such file "mitstates/feat_train.npy" and "utzap/feat_train.npy"?

Ziwei XU · Answer 3 · Wed Aug 18 2021 19:22:51 GMT+0800 (China Standard Time)

Thank you for pointing this out. Those are the pre-computed features. Please try one of the following:

Download the features from here, and place the file under data/[mitstates|ut-zap50k]/
Remove the --pre_feat flag in line 22 of run.sh.

Please let me know in case you have found other issues.

Han Jiang · Answer 4 · Thu Aug 19 2021 11:35:26 GMT+0800 (China Standard Time)

Thanks for your reply, and I think the flag --pre_feat maybe should keep it, otherwise the error that vis_backbone is none occurs; And I can get the result of mitstates dataset, but for ut-zap50k, there is a RuntimeError, maybe utzap_eval_model.state is problematic?

Guangzhi Wang · Answer 5 · Thu Aug 19 2021 14:56:13 GMT+0800 (China Standard Time)

Hi, thank you very much for you interest in our work. I have uploaded a new model file for utzap50k in the original link, would you please have a try?

Han Jiang · Answer 6 · Thu Aug 19 2021 15:52:53 GMT+0800 (China Standard Time)

Thanks, there is no problem with the new model file. And I am looking forward to seeing your training code soon.

Guangzhi Wang · Answer 7 · Fri Aug 20 2021 17:16:40 GMT+0800 (China Standard Time)

Hi, our training code is uploaded!

Han Jiang · Answer 8 · Tue Aug 24 2021 22:17:48 GMT+0800 (China Standard Time)

Thanks for your training code, and I have a question about feat_train.npy. I want to know the how to get feat_train.npy? And when I don't use the feature you provide and use pretrain resnet18 as vis_backbone that don't fine-tune, the results of val auc for mitstates get 4.96 which is lower than 6.0 and need more epoch. Can you give me some advice? Another question is how to get 960-dim visual feature?

Ziwei XU · Answer 9 · Tue Aug 24 2021 23:01:58 GMT+0800 (China Standard Time)

Hi, could you try the following code snippet to generate feature and let us know how it works?

import numpy as np
import h5py
import torch
from torchvision.models import resnet18
from tqdm.autonotebook import tqdm

from model.datasets.CompositionDataset import CompositionDataset

model = torch.nn.Sequential(*list(resnet18(pretrained=True).children())[:-1]).eval().cuda()
DATA_DIM = 512

for ds_name in ('ut-zap50k', 'mitstates'):
    for phase in ('train', 'val', 'test',):
        dataset = CompositionDataset(f'data/{ds_name}', phase, split='natural-split', embedding_dict=None, precompute_feat=False)
        dataset_len = len(dataset)

        data = np.zeros((dataset_len, DATA_DIM))
        for i in tqdm(range(dataset_len)):
            image, _, _ = dataset.data[i]
            img = dataset.transform(dataset.loader(image))
            feat = model(img.unsqueeze(0).cuda())
            data[i, :] = feat.squeeze().cpu().detach()
        np.save(file=f'data/{ds_name}/feat_{phase}.npy', arr=data)

Guangzhi Wang · Answer 10 · Tue Aug 24 2021 23:09:26 GMT+0800 (China Standard Time)

And for the 960-dim feature, we do average pooling on the features from 1~4 residual blocks, and then concatenate the pooled features (64 channels in 1st block, 128 in 2nd, 256 in 3rd and 512 in 4th, 960 channels in total --> 960 dim feature).

Han Jiang · Answer 11 · Wed Aug 25 2021 14:49:33 GMT+0800 (China Standard Time)

Thanks, when I use the code to extract the feature, and then train the model, I can get similar result in paper, and only need 80+ epoch. By rights, shouldn't it be the same？

Guangzhi Wang · Answer 12 · Wed Aug 25 2021 17:23:33 GMT+0800 (China Standard Time)

Hi, I guess it may be related to your implementation. Have you successfully loaded the pretrained model or use the same transformation on image?

Guangzhi Wang · Answer 13 · Tue Aug 31 2021 17:03:52 GMT+0800 (China Standard Time)

There hasn't been any responses for a few days. I assume this has been solved.