Agree on collection name/id of web collections
lintool opened this issue · comments
The official names are GOV2, ClueWeb09b, ClueWeb12-B13 - this is correct casing. But for ids we sometimes downcase everything? i.e., robust04
?
Doesn't really matter - we just need to agree on something...
README doesn't have a catalog of the collections...? Should we create such a table, add existing and above?
I'm fine with either the full name or lowercased, short name, as long as everything is consistent. core17
, core18
, cw09b
, cw12b13
, robust04
to follow what's there currently and keep compatibility with any images that assume lowercase?
I can add a table once we come to a decision.
your proposals above are fine... and gov2
.
Fixed in #77
Is it possible to just use cw12b
since it's safe to assume people know what that is? Not a huge deal but it does make the clueweb stuff a bit more consistent IMO.
fine with me too if someone has an opinion... cc @r-clancy
Fine with me - I'll make the change.
Thanks!