deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Home Page:https://farm.deepset.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Something weird with Inferencer for Text Pair Classification?

rodrigoheck opened this issue · comments

Describe the bug
I am training a model for aspect-based sentiment analysis (trying to extract the sentiment of specific aspects of the text). To achieve this goal, I am using the Text-Pair Classification method of FARM. The model is training fine, but when I try to apply inference, I get wrong and static result independent of my text_b input. This suggests to me that the inference is not using 'text_b' correctly.

Error message
Here there are some examples:
Captura de Tela 2021-02-22 às 22 35 25
Captura de Tela 2021-02-22 às 22 48 49
Captura de Tela 2021-02-22 às 22 35 38

Expected behavior
If not outputting the correct label, the final probability should at least be somewhat sensitive to what I input as "text_b".

Additional context
I tested if the training was considering this a classification task instead of a text-pair classification task. I substituted TextPairClassificationProcessor for TextClassificationProcessor and, sure enough, the accuracy decreased substantially. So it is not like "text_b" is being ignored on training time.

System:

  • OS: Google Colab
  • GPU/CPU: K80
  • FARM version: 0.7.0

Hey @rodrigoheck good catch.
I created a fix in the linked PR.
The fix is a change in input and not code-wise. So you can keep the current 0.7.0 FARM version and your trained model but just supply another input to the inference_from_dicts method. Before it was a dict with keys "text" and "text_b". Now it is only a dict with "text" as key containing a tuple of (text, text_b) as values.

Example:

# For correct Text Pair Classification on raw dictionaries (inference mode), we need to put both
# texts (text, text_b) into a tuple.
# See corresponding conversion in the file_to_dicts() method of TextPairClassificationProcessor: https://github.com/deepset-ai/FARM/blob/5ab5b1620cb51ceb874d4b30c887e377ad1a6e9a/farm/data_handler/processor.py#L744
basic_texts = [
    {"text": ("how many times have real madrid won the champions league in a row",
              "They have also won the competition the most times in a row, winning it five times from 1956 to 1960")},
    {"text": ("how many seasons of the blacklist are there on netflix", "Retrieved March 27 , 2018 .")},
]

Please have a look if this fixes your issue.