force11 / force11-sciwg

FORCE11 Software Citation Implementation Working Group

Home Page:https://www.force11.org/group/software-citation-implementation-working-group

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"archived" language

dbouquin opened this issue · comments

It would be valuable to define what we mean by "archived" as well as "published" vs. "unpublished". When we say "archived" do we just mean that the repository is purposefully working to preserve the software? Or that it implements archival standards that do not contradict the software citation principles? E.g. ASCL vs. Zenodo

ASCL describes itself as a registry, which I think is different than an archive.

My initial definitions (which certainly could use work) are:

  • archived: stored in a way that is intended to preserve the software
  • published: submitted by a creator, along with metadata, to an entity (such as a repository) that archives the software and provides an identifier that resolves to a landing page that contains the metadata and the software
  • unpublished: not submitted by a creator to an archive or repository (though software heritage may archive it anyhow)

In discussion today it seemed like there may be value in adding a sentence about what we're referring to as an "index"

Thinking about this a bit more, a set of concepts seems to me to be:

  • available (the difference between open source and closed source/proprietary)
  • archived (preserved long term)
  • stewarded (actively preserved?)
  • identified (have an identifier that is tied to the software, maybe separate from its location))
  • indexed (the identifier is one of a list maintained by some indexing service)

Feel free to improve these definitions, or suggest different terms

ASCL started out solely as a repository, requiring code deposit, and still acts as one for authors who want us to serve an archive file of their code. We dropped the requirement for code deposit in 2010, so started using "registry" a couple of years later to let people know we don't require code deposit. We do still accept (and for software in danger of being lost, beg for) code deposit, assign DOIs for codes we house, and serve those codes to the public.

Dan, re your list, I would use "maintained" for software that is actively maintained by developers, and rather than "stewarded", would use "curated" for code that is actively preserved.

Also, "published" often means available online. You've defined "published" and "archived" as essentially the same thing.

I think the first footnote does a good job clarifying the reasoning behind "published" and would suggest sticking with "stewarded" (though we don't really use that language in the document). We only say "...that repository steward the software with long-term preservation as their goal..." One can curate a collection without archiving it.
Maintained seems fine to me but I don't see where we should add it to the document right now.

@owlice - thanks for your thoughts. I think we've moved from using these issues back to mostly having discussion in the document, though, and this issue is a bit out of date compared to the document. Can you take a look at the current version and see if you have comments to make there?

And I'm going to close a bunch of these issues now, as having discussion in 2 places is probably counterproductive as we get close to completing the document.