How to create a trie label?

Question

How to create a trie label?

HuangruiChu opened this issue 4 months ago · comments

Richard (Huangrui) Chu commented 4 months ago

Dear Knowledgator:

As I am reading the blog. I notice that you say "We can represent possible generation outputs as a trie data structure, where the node is a token." And you say "We represent labels as a tree of tokens".

Is it possible that our labels are already a trie and your model can select the correct label from the trie-like labels?

For example:

{'emotion': ['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'],
'attitude': ['positive', 'negative', 'neutral']}

Based on the content, we will say the label should be "emotion" -"love"

Ihor Stepanov · Answer 1 · Tue Apr 16 2024 14:33:58 GMT+0800 (China Standard Time)

Hello, thank you for your interest in our project.

You are talking about representing your labels as a tree-like structure at a high level, while we represent labels at a token level to help the model select the right tokens that belong to our label space during generation. To combine both approaches, one variant can be to transform your labels to the following textual format: "emotion -> sadness" or "attitude-neutral" and then initialize these labels trie with our classifier object.

Richard (Huangrui) Chu · Answer 2 · Wed Apr 17 2024 11:03:51 GMT+0800 (China Standard Time)

In fact, I did several tests;

labels =
['emotion_sadness',
'emotion_joy',
'emotion_love',
'emotion_anger',
'emotion_fear',
'emotion_surprise',
'attitude_positive',
'attitude_negative',
'attitude_neutral']

tokenized_labels =
[[0, 13868, 834, 7, 9, 26, 655, 1],
[0, 13868, 834, 1927, 63, 1],
[0, 13868, 834, 5850, 15, 1],
[0, 13868, 834, 9, 9369, 1],
[0, 13868, 834, 89, 2741, 1],
[0, 13868, 834, 3042, 102, 7854, 1],
[0, 7525, 834, 26093, 1],
[0, 7525, 834, 31600, 1],
[0, 7525, 834, 8992, 8792, 1]]

['emotion-sadness',
'emotion-joy',
'emotion-love',
'emotion-anger',
'emotion-fear',
'emotion-surprise',
'attitude-positive',
'attitude-negative',
'attitude-neutral']

[[0, 13868, 18, 7, 9, 26, 655, 1],
[0, 13868, 18, 1927, 63, 1],
[0, 13868, 18, 5850, 15, 1],
[0, 13868, 18, 9, 9369, 1],
[0, 13868, 18, 89, 2741, 1],
[0, 13868, 18, 3042, 102, 7854, 1],
[0, 7525, 18, 26093, 1],
[0, 7525, 18, 31600, 1],
[0, 7525, 18, 8992, 8792, 1]]

These are bad examples for the tree-like structure at a high level, because the trie created does not capture the "right word" and right structure.
The result for the emotion classification drops to

Here are two ways I think it good for the tree-like structure at a high level
['emotion sadness',
'emotion joy',
'emotion love',
'emotion anger',
'emotion fear',
'emotion surprise',
'attitude positive',
'attitude negative',
'attitude neutral']

[[0, 13868, 24784, 1],
[0, 13868, 3922, 1],
[0, 13868, 333, 1],
[0, 13868, 11213, 1],
[0, 13868, 2971, 1],
[0, 13868, 4158, 1],
[0, 7525, 1465, 1],
[0, 7525, 2841, 1],
[0, 7525, 7163, 1]]

['emotion -> sadness',
'emotion -> joy',
'emotion -> love',
'emotion -> anger',
'emotion -> fear',
'emotion -> surprise',
'attitude -> positive',
'attitude -> negative',
'attitude -> neutral']

[[0, 13868, 3, 13114, 24784, 1],
[0, 13868, 3, 13114, 3922, 1],
[0, 13868, 3, 13114, 333, 1],
[0, 13868, 3, 13114, 11213, 1],
[0, 13868, 3, 13114, 2971, 1],
[0, 13868, 3, 13114, 4158, 1],
[0, 7525, 3, 13114, 1465, 1],
[0, 7525, 3, 13114, 2841, 1],
[0, 7525, 3, 13114, 7163, 1]]

However, the way different tree-like structure at a high level have different performance,

'emotion sadness' etc.
precision recall f1-score support

 sadness     0.6955    0.5542    0.6169       581
     joy     0.6667    0.0029    0.0057       695
    love     0.1325    0.9057    0.2311       159
   anger     0.5159    0.5309    0.5233       275
    fear     0.6585    0.4821    0.5567       224
surprise     0.0000    0.0000    0.0000        66

accuracy                         0.3610      2000

macro avg 0.4448 0.4126 0.3223 2000
weighted avg 0.5889 0.3610 0.3339 2000

'emotion -> sadness' etc.
precision recall f1-score support

emotion -> sadness 0.4155 0.8296 0.5537 581
emotion -> joy 0.8257 0.1295 0.2239 695
emotion -> love 0.4444 0.0252 0.0476 159
emotion -> anger 0.8000 0.0436 0.0828 275
emotion -> fear 0.2868 0.7054 0.4077 224
emotion -> surprise 0.1667 0.3939 0.2342 66

       accuracy                         0.3860      2000
      macro avg     0.4898    0.3545    0.2583      2000
   weighted avg     0.5906    0.3860    0.3072      2000

The accuracy drops a lot compared with the example provided in https://blog.knowledgator.com/how-to-classify-text-into-millions-of-classes-68aee1de3802

Ihor Stepanov · Answer 3 · Wed Apr 17 2024 16:07:07 GMT+0800 (China Standard Time)

It is amazing that you have done this experiment!

It looks like the first ways to write labels fail because of tokenization, which fails to capture the right words and just splits text on less meaningful tokens. In the last two examples, it performs better, and I think playing with prompts can increase the results.

Also, such formats can be unusual for a model, and it's recommended additionally to fine-tune the model on datasets that represent labels in such a way. With some time, it's easy to artificially transform existing datasets to such label format.

Richard (Huangrui) Chu · Answer 4 · Thu Apr 18 2024 00:58:36 GMT+0800 (China Standard Time)

For the wiki article title prediction task, I think you just use zero shot for the t5 right?

I am still wondering why the unlimited classifier performs so good in the task of wiki article title prediction (around 0.9 acc) while only have 0.65 acc for the emition classification task.

Ihor Stepanov · Answer 5 · Thu Apr 18 2024 15:01:31 GMT+0800 (China Standard Time)

Yes, we used zero-shot with the T5 model for the wiki article title prediction.

Emotion classification tasks look more subjective, so they are more difficult for zero-shot models. You can revisit the emotion dataset, and it's sometimes challenging to unambiguously assign the text to one of the classes.

Regarding wiki article title prediction, despite that this task presented as 6 million labels classification, it's simpler for a model because often the title is an entity that is the most frequent in the text. However, when we prompted T5 to generate a title without setting generation constraints, we got pure results. So, if a model can understand the semantics of the article, limiting possible variants of generation to true wiki article titles can help a model select the right title even from large amounts of labels.