osirrc / jig

Jig for the Open-Source IR Replicability Challenge (OSIRRC)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Agree on collection name/id of web collections

lintool opened this issue · comments

The official names are GOV2, ClueWeb09b, ClueWeb12-B13 - this is correct casing. But for ids we sometimes downcase everything? i.e., robust04?

Doesn't really matter - we just need to agree on something...

README doesn't have a catalog of the collections...? Should we create such a table, add existing and above?

I'm fine with either the full name or lowercased, short name, as long as everything is consistent. core17, core18, cw09b, cw12b13, robust04 to follow what's there currently and keep compatibility with any images that assume lowercase?

I can add a table once we come to a decision.

your proposals above are fine... and gov2.

Fixed in #77

Is it possible to just use cw12b since it's safe to assume people know what that is? Not a huge deal but it does make the clueweb stuff a bit more consistent IMO.

fine with me too if someone has an opinion... cc @r-clancy

Fine with me - I'll make the change.

Thanks!