TextPair Classification with multilabel Problem

Question

TextPair Classification with multilabel Problem

felixvor opened this issue 3 years ago · comments

Question
Not sure if this is a bug or I am doing something wrong here. I am trying to train a model with multilabel classification and two text inputs (i.e. textpair).

I prepared an example dataset with the following format:

text	text_b	label
Sentence A.	Sentence B.	0,1,2
Another A.	Another B.	1,2
. . .

I found that using TextPairClassificationProcessor with multilabel=True seems to work fine to prepare the data for training, which I checked with the debugger. But on training start I get the following error:

...
06/22/2021 00:14:41 - INFO - farm.modeling.language_model -   Loaded bert-base-cased
06/22/2021 00:14:41 - INFO - farm.modeling.prediction_head -   Prediction head initialized with size [768, 3]
06/22/2021 00:14:44 - INFO - farm.modeling.optimization -   Loading optimizer `TransformersAdamW`: '{'correct_bias': False, 'weight_decay': 0.01, 'lr': 2e-05}'
06/22/2021 00:14:45 - INFO - farm.modeling.optimization -   Using scheduler 'get_linear_schedule_with_warmup'
06/22/2021 00:14:45 - INFO - farm.modeling.optimization -   Loading schedule `get_linear_schedule_with_warmup`: '{'num_warmup_steps': 67.2, 'num_training_steps': 672}'
06/22/2021 00:14:47 - INFO - farm.train -
***Growing***
Train epoch 0/1 (Cur. train loss: 0.0000):   0%|                                                                      | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "text_pair_classification.py", line 131, in <module>
    text_pair_classification()
  File "text_pair_classification.py", line 98, in text_pair_classification
    trainer.train()
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\farm\train.py", line 301, in train
    per_sample_loss = self.model.logits_to_loss(logits=logits, global_step=self.global_step, **batch)
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\farm\modeling\adaptive_model.py", line 386, in logits_to_loss
    all_losses = self.logits_to_loss_per_head(logits, **kwargs)
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\farm\modeling\adaptive_model.py", line 370, in logits_to_loss_per_head
    all_losses.append(head.logits_to_loss(logits=logits_for_one_head, **kwargs))
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\farm\modeling\prediction_head.py", line 360, in logits_to_loss
    return self.loss_fct(logits, label_ids.view(-1))
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\torch\nn\modules\loss.py", line 1121, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\Users\Admin\anaconda3\envs\farm\lib\site-packages\torch\nn\functional.py", line 2824, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
ValueError: Expected input batch_size (16) to match target batch_size (48).

In the last line '16' is my batch size and the '48' is exactly batch_size * num_prediction_head_outputs (tested this with different batch sizes and label lists). I came to my limits when trying to debug your code for the training and loss calculation process and was wondering if you could help me find a solution. Is farm suitable to do multilabel with textpair? I would like to contribute and make this use case more accessible, but currently I do not know where to start looking for a fix. Maybe you have an idea?
Any help would be appreciated :)

Julian Risch · Answer 1 · Tue Jun 22 2021 16:27:00 GMT+0800 (China Standard Time)

Hi @DieseKartoffel The data format looks good to me (and it is the same as in our multilabel classification example: https://github.com/deepset-ai/FARM/blob/master/examples/doc_classification_multilabel.py except for the additional text_b input). Are you providing label_list = ["0","1","2"] to the TextClassificationProcessor? I will try to reproduce the error message on my side.

Felix · Answer 2 · Tue Jun 22 2021 17:39:57 GMT+0800 (China Standard Time)

Hey Julian thank you for looking into this. Yes I tried to work close to your examples for debugging :-)
I made sure to use the correct labels, and if a label from the dataset is not part of label_list, farm will already output a useful error and not start the training.
I also tried it with different labels and prepared corresponding datasets. For example I tried batch size 10 with label_list=["a","b","c","d","e"] and got ValueError: Expected input batch_size (10) to match target batch_size (50).

Julian Risch · Answer 3 · Tue Jun 22 2021 23:53:34 GMT+0800 (China Standard Time)

So far, I could not replicate the error. Could you maybe share some code and a small data example? What I did so far is the following:

I took the https://github.com/deepset-ai/FARM/blob/master/examples/doc_classification_multilabel.py example
Replaced TextClassificationProcessor with TextPairClassificationProcessor
Changed the basic_texts variable to contain pairs of texts by copying text value to text_b value
Added a text_b column to the train.tsv and val.tsv dataset that contains the same text as column text

basic_texts = [
        {"text": ("You ... ...", "You ... ...")},
        {"text": ("What a lovely world", "What a lovely world")},
    ]

The output that I get is the following:

[{'task': 'text_classification', 'predictions': [{'start': None, 'end': None, 'context': "('You ... ...', 'You ... ...')", 'label': "['toxic', 'obscene', 'insult']", 'probability': array([0.93692017, 0.19396962, 0.8908834 , 0.10999262, 0.8351795 ,
       0.2840815 ], dtype=float32)}, {'start': None, 'end': None, 'context': "('What a lovely world', 'What a lovely world')", 'label': '[]', 'probability': array([0.371408  , 0.00837683, 0.1528986 , 0.00711144, 0.16077891,
       0.01845325], dtype=float32)}]}]

Felix · Answer 4 · Wed Jun 23 2021 05:17:46 GMT+0800 (China Standard Time)

I was able to get it working by reproducing your approach step by step. I then compared the code to my project and found that I was using the wrong prediction head... Very easy solution which I should have spotted from the start... Thank you very much for your help!