wzk1015 / CNMT

[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

Home Page:https://arxiv.org/pdf/2012.03662.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

no ocr confidence in imdb_train

ZJ-Zhao opened this issue · comments

i use your image dataset to train my model, but i cannot find ocr confidence in sample_list
image
(imdb_train.npy)

I checked imdb_train and it works fine in my environment. Can you provide the full traceback of the error?

thanks for your watching!
i try to transplant your config to the newest mmf version
i have found ocr-confidence now, but attribute ocr-tokens is missing
i'm sure that i add it in textvqa dataset.py:
max_len = self.config.processors.answer_processor.params.max_length
sample.ocr_tokens = ocr_tokens[:max_len]
and when i start training it turns to error
Traceback (most recent call last): File "/data/zjzhao/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/data/zjzhao/mmf/mmf_cli/run.py", line 68, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/data/zjzhao/mmf/mmf_cli/run.py", line 58, in main trainer.train() File "/data/zjzhao/mmf/mmf/trainers/mmf_trainer.py", line 123, in train self.training_loop() File "/data/zjzhao/mmf/mmf/trainers/core/training_loop.py", line 31, in training_loop self.run_training_epoch() File "/data/zjzhao/mmf/mmf/trainers/core/training_loop.py", line 89, in run_training_epoch report = self.run_training_batch(batch, num_batches_for_this_update) File "/data/zjzhao/mmf/mmf/trainers/core/training_loop.py", line 159, in run_training_batch report = self._forward(batch) File "/data/zjzhao/mmf/mmf/trainers/core/training_loop.py", line 176, in _forward model_output = self.model(prepared_batch) File "/data/zjzhao/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/data/zjzhao/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 511, in forward output = self.module(*inputs[0], **kwargs[0]) File "/data/zjzhao/mmf/mmf/models/base_model.py", line 171, in __call__ model_output = super().__call__(sample_list, *args, **kwargs) File "/data/zjzhao/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/data/zjzhao/mmf/mmf/models/cnmt.py", line 150, in forward self._forward_ocr_encoding(sample_list, fwd_results) File "/data/zjzhao/mmf/mmf/models/cnmt.py", line 234, in _forward_ocr_encoding num_samples = len(sample_list.ocr_tokens) File "/data/zjzhao/mmf/mmf/common/sample.py", line 166, in __getattr__ "Valid choices are {}".format(key, self.fields()) AttributeError: Key ocr_tokens not found in the SampleList. Valid choices are ['question_id', 'image_id', 'image_feature_0', 'image_info_0', 'image_feature_1', 'image_info_1', 'text', 'text_len', 'obj_bbox_coordinates', 'context', 'context_tokens', 'context_tokens_enc', 'context_feature_0', 'context_info_0', 'context_feature_1', 'context_info_1', 'order_vectors', 'ocr_bbox_coordinates', 'ocr_confidence', 'sampled_idx_seq', 'train_prev_inds', 'train_loss_mask', 'targets', 'caption_str', 'ref_strs', 'dataset_name', 'dataset_type']

pythia/mmf works like this:
you provide -dataset m4c_textcaps arg when running run.py (here I put it in train.sh)
mmf finds the dataset builder in /datasets/captioning/m4c_textcaps/builder.py (as it has registered in line 7)
M4CTextCapsBuilderextends M4CTextVQABuilder, which calls M4CTextVQADataset when running
if you are sure sample.ocr_tokens = xxx is added in /datasets/vqa/m4c_textvqa/dataset.py, you can use debugger (or simply add print statement) to see if this statement has been executed

you can also refer to mmf's documentation for details

i'm sure sample.ocr_tokens = xxx has been executed by print statement,
and num_samples = len(sample_list.ocr_tokens) sometimes can be executed correctly.
it's so weird