About glove.42B.300d.dat
hannajiang opened this issue · comments
Thanks for your excellent job, but I have some question when I run your code. I want to know how to prepare glove.42B.300d.dat, glove.42B.300d.50_words.pkl and glove.42B.300d.50_idx.pkl?
Thank you for your interest!
Actually this is a leftover code and we will clean it soon. Meanwhile, you can generate the data files following these steps:
- Download glove word embedding 42B 300d here, unzip it and place
glove.42B.300d.txt
underdata/glove/
. - Open your terminal, change directory to
model/datasets
, runpython glove.py
. Note that you need to installbcolz
. - The script will call
load_glove_to_binary()
to generate the required data files indata/glove/
.
Thanks for your reply, but I have another question that no such file "mitstates/feat_train.npy" and "utzap/feat_train.npy"?
Thank you for pointing this out. Those are the pre-computed features. Please try one of the following:
- Download the features from here, and place the file under
data/[mitstates|ut-zap50k]/
- Remove the --pre_feat flag in line 22 of run.sh.
Please let me know in case you have found other issues.
Hi, thank you very much for you interest in our work. I have uploaded a new model file for utzap50k in the original link, would you please have a try?
Thanks, there is no problem with the new model file. And I am looking forward to seeing your training code soon.
Hi, our training code is uploaded!
Thanks for your training code, and I have a question about feat_train.npy. I want to know the how to get feat_train.npy? And when I don't use the feature you provide and use pretrain resnet18 as vis_backbone that don't fine-tune, the results of val auc for mitstates get 4.96 which is lower than 6.0 and need more epoch. Can you give me some advice? Another question is how to get 960-dim visual feature?
Hi, could you try the following code snippet to generate feature and let us know how it works?
import numpy as np
import h5py
import torch
from torchvision.models import resnet18
from tqdm.autonotebook import tqdm
from model.datasets.CompositionDataset import CompositionDataset
model = torch.nn.Sequential(*list(resnet18(pretrained=True).children())[:-1]).eval().cuda()
DATA_DIM = 512
for ds_name in ('ut-zap50k', 'mitstates'):
for phase in ('train', 'val', 'test',):
dataset = CompositionDataset(f'data/{ds_name}', phase, split='natural-split', embedding_dict=None, precompute_feat=False)
dataset_len = len(dataset)
data = np.zeros((dataset_len, DATA_DIM))
for i in tqdm(range(dataset_len)):
image, _, _ = dataset.data[i]
img = dataset.transform(dataset.loader(image))
feat = model(img.unsqueeze(0).cuda())
data[i, :] = feat.squeeze().cpu().detach()
np.save(file=f'data/{ds_name}/feat_{phase}.npy', arr=data)
And for the 960-dim feature, we do average pooling on the features from 1~4 residual blocks, and then concatenate the pooled features (64 channels in 1st block, 128 in 2nd, 256 in 3rd and 512 in 4th, 960 channels in total --> 960 dim feature).
Thanks, when I use the code to extract the feature, and then train the model, I can get similar result in paper, and only need 80+ epoch. By rights, shouldn't it be the same?
Hi, I guess it may be related to your implementation. Have you successfully loaded the pretrained model or use the same transformation on image?
There hasn't been any responses for a few days. I assume this has been solved.