I'm getting an f1 score of zero.

Question

I'm getting an f1 score of zero.

ilaouirine opened this issue 10 months ago · comments

Hello,
I want firstly to thank you for making your code publicly available.
Actually, I was trying to reproduce your code using ACE04 dataset (I'm using only the samples provided here, since I don't have access to the dataset). But I'm getting an f1 score of 0.
Dev_Epoch24 {'p': 0, 'r': 0, 'f1': 0}

Do you have any insights on what might be causing the problem?
Thank you!

Zheng Yuan · Answer 1 · Tue Jul 18 2023 17:49:40 GMT+0800 (China Standard Time)

With only sampled data, there is too little signal for the model to learn. You may try the GENIA dataset which has public data to reproduce our result.

imen laouirine · Answer 2 · Tue Jul 18 2023 18:23:04 GMT+0800 (China Standard Time)

Ok thank you!

imen laouirine · Answer 3 · Tue Jul 18 2023 22:43:38 GMT+0800 (China Standard Time)

Hello,
I am trying now to work on my own dataset, however I dindn't know how to set the padding_idx and the dim.
this two parameters : word_embedding_config['padding_idx'] and word_embedding_config['dim']

Thank you!

Zheng Yuan · Answer 4 · Wed Jul 19 2023 00:10:07 GMT+0800 (China Standard Time)

use word_embed.py on your dataset, and you will obtain your word2id.json, you can check padding_idx here. And word embedding dim is depend on which word embedding you used.

imen laouirine · Answer 5 · Wed Jul 19 2023 01:13:47 GMT+0800 (China Standard Time)

Thank you very much for your assistance.
Actually, I tried to start the training on my own dataset, but I am getting this error :

Strangely, this error only occurs when I attempt to train the model using the entire dataset. When I experimented with a subset of the same dataset, the training process proceeded without any errors. However, I noticed that despite successful training, the resulting F1 score was consistently 0.

I would be grateful if you could help me understand the cause of this error and why the F1 score remains at 0 when training with the subset. Any insights or suggestions you can provide to troubleshoot this issue would be highly appreciated.

Thank you once again for your invaluable support.

imen laouirine · Answer 6 · Wed Jul 19 2023 01:50:03 GMT+0800 (China Standard Time)

Is there a limit for the dataset size?

Zheng Yuan · Answer 7 · Wed Jul 19 2023 11:31:39 GMT+0800 (China Standard Time)

There is not a limit for a dataset, your error may be due to a empty sentence or a empty word in any sentence.
For F1 score, I have no idea. I remember my model will output predicitions every epoch, you should check it first.

imen laouirine · Answer 8 · Mon Jul 24 2023 20:40:02 GMT+0800 (China Standard Time)

Thank you for your response!
I will check if there are empty sentences!
Concerning the predictions, this is what I found :
For the first epochs I have : 2,11 UNIT|13,14 UNIT|0,8
And for the last epochs, I have empty files.

Zheng Yuan · Answer 9 · Mon Jul 24 2023 22:41:54 GMT+0800 (China Standard Time)

You should use this framework with more samples to get normal results.

imen laouirine · Answer 10 · Tue Jul 25 2023 01:27:32 GMT+0800 (China Standard Time)

Thank you very much for your assistance!
The error is fixed, but I am having a CUDA Out Of Memory issue even with a batch size of 1.
Is there any solution for this without changing the GPU that I'm using?
Thank you!

Zheng Yuan · Answer 11 · Tue Jul 25 2023 10:32:56 GMT+0800 (China Standard Time)

What is your GPU memory?
You should use small LM (i.e. roberta-base) and small triaffine-dimension to 128/96/64