paperswithcode / sota-extractor

The SOTA extractor pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate agents in properties table

dhamaris opened this issue · comments

Hello,

I am not sure of whether this is a bug or a feature:
https://paperswithcode.com/sota/visual-question-answering-on-gqa-test2019

image

Is it ok that agent is repeated? I was assuming that I would only have 1 agent if all other fields (paperUrl, date, source, etc) are the same.

Thank you

redundant_tuplas_by_metric_pwc.xlsx
I don't know if this will be of any use, but I identified the tuplas of experiments that contain more than one value when grouping by task, dataset, agent, paperDate, paperUrl and metric.

@dhamaris Thanks for opening this issue. Looking at the original source of where the results came from, it appears that the submissions are unique in the sense that they come from two different authors. Because we pull information from other sources we expect these scenarios to appear but they shouldn't be too common. The model names ideally should be unique.

Here is original source of the results you pointed out: https://eval.ai/web/challenges/challenge-page/225/leaderboard/733#leaderboardrank-18

@omarsar Thank you for your swift response, we normalized the evaluation-tables.json into a DB with each tupla found in the tree.
The problem with that is that I lost the information to know whether an experiments belongs to a specific contribution or not, it is all mixed up together:

image

I tried to group by paperDate and paperUrl, but that does not solve the problem when they are null or when they match.
So I am considering adding a new field to my model called something like contributionId that identifies an unique SOTARow object, but I am still not sure. What is your take on this? Are these 2 experiments the same, but added by 2 people? Or are these 2 experiments 2 actual items? Why would you say that these scenarios shouldn't be too common? I am trying to understand this table. Is it uncommon that people report experiments by the same agent but providing multiple results?

And if the modelNames should be unique, does it mean that these redundant information will be at some point corrected?

I am not sure how to fix what you are trying to achieve. Not sure about it, but I am inclined to think that the results correspond two different experiments by two different users. Maybe they use similar model, backbone, experimental setup, or even same code base, as it is common in public community leaderboards such as Kaggle. If the results are obtained from external leaderboards like in this case, it is possible we will see this type of results. If the results are added on our website directly, it is less common as results are added directly through papers, helping to preserve one model to one result relationship.

Thank you, we will keep the relationship 1 to 1 as well