daoyuan98 / Relation-CZSL

Official implementation of the TMM paper "Relation-aware Compositional Zero-shot Learning for Attribute-Object Pair Recognition".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About glove.42B.300d.dat

hannajiang opened this issue · comments

Thanks for your excellent job, but I have some question when I run your code. I want to know how to prepare glove.42B.300d.dat, glove.42B.300d.50_words.pkl and glove.42B.300d.50_idx.pkl?

Thank you for your interest!

Actually this is a leftover code and we will clean it soon. Meanwhile, you can generate the data files following these steps:

  1. Download glove word embedding 42B 300d here, unzip it and place glove.42B.300d.txt under data/glove/.
  2. Open your terminal, change directory to model/datasets, run python glove.py. Note that you need to install bcolz.
  3. The script will call load_glove_to_binary() to generate the required data files in data/glove/.

Thanks for your reply, but I have another question that no such file "mitstates/feat_train.npy" and "utzap/feat_train.npy"?

Thank you for pointing this out. Those are the pre-computed features. Please try one of the following:

  1. Download the features from here, and place the file under data/[mitstates|ut-zap50k]/
  2. Remove the --pre_feat flag in line 22 of run.sh.

Please let me know in case you have found other issues.

Thanks for your reply, and I think the flag --pre_feat maybe should keep it, otherwise the error that vis_backbone is none occurs; And I can get the result of mitstates dataset, but for ut-zap50k, there is a RuntimeError, maybe utzap_eval_model.state is problematic?
image

Hi, thank you very much for you interest in our work. I have uploaded a new model file for utzap50k in the original link, would you please have a try?

Thanks, there is no problem with the new model file. And I am looking forward to seeing your training code soon.

Hi, our training code is uploaded!

Thanks for your training code, and I have a question about feat_train.npy. I want to know the how to get feat_train.npy? And when I don't use the feature you provide and use pretrain resnet18 as vis_backbone that don't fine-tune, the results of val auc for mitstates get 4.96 which is lower than 6.0 and need more epoch. Can you give me some advice? Another question is how to get 960-dim visual feature?

Hi, could you try the following code snippet to generate feature and let us know how it works?

import numpy as np
import h5py
import torch
from torchvision.models import resnet18
from tqdm.autonotebook import tqdm

from model.datasets.CompositionDataset import CompositionDataset

model = torch.nn.Sequential(*list(resnet18(pretrained=True).children())[:-1]).eval().cuda()
DATA_DIM = 512

for ds_name in ('ut-zap50k', 'mitstates'):
    for phase in ('train', 'val', 'test',):
        dataset = CompositionDataset(f'data/{ds_name}', phase, split='natural-split', embedding_dict=None, precompute_feat=False)
        dataset_len = len(dataset)

        data = np.zeros((dataset_len, DATA_DIM))
        for i in tqdm(range(dataset_len)):
            image, _, _ = dataset.data[i]
            img = dataset.transform(dataset.loader(image))
            feat = model(img.unsqueeze(0).cuda())
            data[i, :] = feat.squeeze().cpu().detach()
        np.save(file=f'data/{ds_name}/feat_{phase}.npy', arr=data)

And for the 960-dim feature, we do average pooling on the features from 1~4 residual blocks, and then concatenate the pooled features (64 channels in 1st block, 128 in 2nd, 256 in 3rd and 512 in 4th, 960 channels in total --> 960 dim feature).

Thanks, when I use the code to extract the feature, and then train the model, I can get similar result in paper, and only need 80+ epoch. By rights, shouldn't it be the same?

Hi, I guess it may be related to your implementation. Have you successfully loaded the pretrained model or use the same transformation on image?

There hasn't been any responses for a few days. I assume this has been solved.