adymaharana / StoryViz

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluation metrics

KyonP opened this issue Β· comments

commented

Hello, I hope your research goes well. πŸ˜€

I am trying to evaluate the metrics that you proposed for my model.

I have read your paper. However, I am asking you to double-check.
(my results seem a bit odd and off the scale, that's why 😒)

  1. I presume that the "character F1" score represents the "micro avg" of F1 score outputs from your eval_clasifier.py code? Am I correct?
  2. also, "Frame accuracy" represents "eval Image Exact Match Acc" outputs from your eval_classifier.py code?
  3. are BLEU 2 and BLEU 3 scores scaled by 100? I have tested your translate.py code with my generated images, and I've got about 0.04-ish scores. are the BLEU scores you reported multiplied by 100?
  4. Lastly, It is unclear about the R-precision evaluation method. Do I require to train your code (H-DAMSM)? if so, when is the right time to stop the training and benchmark my model?
  5. To fair comparison, is it possible to be provided your H-DAMSM pretrained weight?

I am currently stuck on the R-precision evaluation using H-DAMSM. So, I was thinking of utilizing the recent CLIP R-Precision instead, but I am leaving this issue to avoid a fair comparison issue.