dcmi / dctap

DC Tabular Application Profile

Home Page:https://dcmi.github.io/dctap/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AP definition for primer

kcoyle opened this issue · comments

Original proposal was in #35, comment.

Current TAP primer definition

The creation of a new metadata schema from available open vocabularies creates what can be called an application profile, or "profile.” A profile defines and constrains the property/value pairs that are used in metadata to describe resources and to provide the rules that govern the creation and reuse of metadata.

Singapore framework definition

The term profile is widely used to refer to a document that describes how standards or specifications are deployed to support the requirements of a particular application, function, community, or context. In the metadata community, the term application profile has been applied to describe the tailoring of standards for specific applications.

Proposed new primer wording

Within a community that creates and shares metadata there is often the need to work with variations of the general metadata framework that has been agreed. An application profile is a tailoring of a metadata standard for specific applications, functions or variations in the needs of a community that is sharing information. It is generally based on an expressed model and may include additional vocabulary elements from any available source. A profile defines the additional rules for how terms and concepts from existing metadata schema should be used.

(Note: this newly proposed definition takes from the SF and from Phil's framework, plus some wording of mine. It would be great to edit this down a bit.)

Sorry, this won't help in editting down the text, but i think that some parts of that defition are too restrictive. For example:

An application profile is a tailoring of a metadata standard for specific applications

I think it should be "one or more metadata standards" (rather than other sources being treated as a secondary aspect)

A profile defines the additional rules for how terms and concepts from existing metadata schema

I think I prefer Nishad's formulation around selecting, refining and explaining (I forget his exact wording)

For some reason the "one or more" makes me nervous. How about "one or a combination of metadata standards"? I think the "or more" sounds like they may not be related, and perhaps "combination" takes care of that.

I'll find Nishad's wording and see if I can fit it in.

I guess it depends on what you mean by "related", but I don't see that have to be related except that they somehow are relevant to the domain that the AP covers in some compatible way. On the surface DC Terms and ODRL aren't related, but they're both used in DCAT. But it's true that the Application Profile is (partly) about combining metadata vocabularies/standards, so I think "combination" would work.

Yes, I was just thinking about how people use these sorts of function-specific vocabularies like PROV or BIO, and ODRL is a good example. If our definition leans on Singapore Framework there is a concept of a community domain model. If people are creating a profile, rather than a new metadata standard, then they are creating a profile of something. I'd like to distinguish that something from the additions (or subtractions) of the profile. I was calling the something a standard, but as you point out, when you are using vocabularies those are often themselves a standard. So we need a word for that community domain model (and vocabulary).

I note here that I don't think we should follow the Singapore Framework to a T. I would have some nits to pick with it.

Also it may be that the AP builds equally on two or more standards/vocabularies. For example, our simple book example has an equal number of terms from dcterms and foaf--two from each--with the addition of one from schema.org. OK, that's a toy example, but I think the ratio could persist if we built out more detail.

So I suggest thinking along these lines. A community creating or using metadata will normally be doing so within some "domain", i.e. some field of endeavour. The SF suggests there should be an explicit model for the enitites and relationships within this domain, that is, a domain model. Various (one or more) vocabulary standards may provide terms relevant to the domain. The purpose of the application profile is to select/constrain/explain terms from these vocabulary standards that are to be used by the community; if the SF is followed, this selecting/constrianing/explaining will be in the context of the domain model.

I don't think we should distract ourselves with the fact that an application profile may become a standard, à la DCAT, and that then you can start getting application profiles of the application profile...

our simple book example has an equal number of terms from dcterms and foaf--two from each--with the addition of one from schema.org.

Auto-correct: I was only counting the properties. There are a couple of classes used as well, one each from schema.org and foaf. So the count is 2 from DCTerms, 2 from schema.org and 3 from FOAF.

@kcoyle

If people are creating a profile, rather than a new metadata standard, then they are creating a profile of something.

I am uneasy with this use of "standard", and even of the idea that a profile need be a profile of anything more than the metadata. As I see it, it would be quite valid to invent ten new metadata properties, ad hoc, and create a profile for metadata using just those ten properties. Those ten properties would not constitute a "standard" in any meaningful sense, and it would not make much sense to say that the profile "profiles" that set of ten properties.

To me, a profile draws on vocabularies and data models, defined somewhere outside the profile (though not necessarily in a "standard"), and constrains - or, as Karen puts it: defines rules for - their usage in metadata.

So instead of putting "standards" front and center, I'd prefer to put "vocabularies and data models (ie, domain models)" front and center, then suggest that it is good for interoperability if profiles are based on published standards.

@philbarker The simple book example is a made-up example that is not based on a domain model. A domain model would be something like the BIBFRAME model, or the DataCite model. We haven't used (or shown) such models in our examples but perhaps we should.

@tombaker I agree that "standard" is a terrible term. Let's see if we can fit in "data models"/domain models and "vocabularies". We might say that one can combine domain models, such as a model for document description with a model for rights description.

BTW, is there a domain model for dcterms? I am thinking of dcterms as a vocabulary with minimal semantic constraints, and that a domain model would be constraints over that vocabulary.

@kcoyle

BTW, is there a domain model for dcterms? I am thinking of dcterms as a vocabulary with minimal semantic constraints, and that a domain model would be constraints over that vocabulary.

Yes, I agree - or, put another way, dcterms could be used in the context of a domain model. Mikael used to say that properties "decorate" the domain model. I don't recommend we use that word but I think I see what he means.

In our most recent discussion of Phil's style guidelines, aka "the Framework", we were trying to sort out which words to describe things in an application profile. The passage of interest is short enough to quote in its entirety:

An application profile describes, explains, and defines additional rules for how existing vocabularies and models should be used in a metadata instance.

An application profile comprises a set of constraints on metadata statements found in the instance data.

The DC TAP specification defines a tabular format for application profiles in which constraints on statements are the rows and the indvidual elements of those constraints form the columns.

A shape is set of statement constraints …

...about which:

  • "we wound up not saying what a constraint is, except that it is in an AP. Which may be fine..." (Phil)
  • "rows = rules, descriptions and explanations for statements" (Phil)
  • It is still unclear whether we want to talk about "rules" or "constraints", or both, and if both, whether they are different.

I note that in the most recent version of the Primer, which was last edited almost one year ago, we refer to a shape, basically, as a "list of properties and their constraints".

I kinda agree that "constraints on statements" (or "statement constraints") is not ideal, but I cannot warm to "rules".

I am reminded that Mikael Nilsson made a distinction between "templates", which held "constraints". Unsurprisingly, below the surface differences of terminology, the DSP model has alot of the same things as DCTAP, including cardinality.

If I recall, the argument against "template" in the early days of DCTAP was that the term would be misunderstood in the library world, however I note that BIBFRAME Profiles has "templates" (for example, see the Sound Recording template).

I still think "template" is a solid name for the construct that pulls together constraints on statements.

DC-DSP

The other term I would like to put on the table is "pattern".

"Pattern", to me, has roughly the same connotations as "template".

("Mold" also comes to mind but... no.)

To be clear, I do not think referring to "statement templates" need in any way affect the column names - valueConstraint, valueConstraintType, and the like.

I also see some value in aligning, on this point, with DSP and Bibframe.

@tombaker I think that the problem with "template" is that it is usually a user interface term for what you see on the screen with boxes to be filled in. A TAP could be the source of information for a developer creating a user interface for data input, but it is not itself that kind of template. Perhaps if we define it well, for example: "a row is a template for the statement that gives the rules and constraints, often within the context of a shape".

Also:

I note that in the most recent version of the Primer, which was last edited almost one year ago, we refer to a shape, basically, as a "list of properties and their constraints".

Yes, this is one of the areas of the primer that jumped out as needing work - the "list of properties" was the question, though, not the constraints. I'm hoping to go through the primer in detail when we finish with the framework and propose edits based on what we come up with.

Why not "rule"?

  • Profiles are created for a range of purposes - to suggest a way to describe something in data, to articulate expectations about what one could (or should) find in data, to generate an input form for creating data, or to validate data, where validation can be either tolerant or strict. In other words, actual profiles range, in their intent, from the loosely descriptive to the strictly prescriptive. To me, the word "rules" pretty strongly implies prescription.
  • The term "rule" has baggage in the standards world and, as far as I can see, most meanings imply complexity and/or precision. The CfP for a 2005 W3C workshop on rule languages, for example, says: "Rule languages and rule systems are widely used in applications ranging from database integration, service provisioning, and business process management to loan underwriting, privacy policies and Web services composition." The term "business rule" is also pretty well-known, but quite different from what we would mean by "rule" in a DCTAP context.

@tombaker I don't think we'll find a term for any of this that doesn't have any baggage. I think that's why we should use sentences rather than expecting a specific term to say all that we mean. I think the framework does that well. It shows the terminology in a sentence that functions both as a context and as a definition. So whatever we use we will need to be showing our readers what we mean. As we've done with the framework (thanks @philbarker !) our main goal is to avoid using the same terms in different ways. Using sentences can avoid any overly strict interpretations of terms because a sentence can say more than a single term can. I don't think we need a strict terminology along the lines of a formal standard, but we do need to be clear and consistent.

@kcoyle

I don't think we'll find a term for any of this that doesn't have any baggage
...
I don't think we need a strict terminology along the lines of a formal standard, but we do need to be clear and consistent.

Yes, agreed! I have always felt that we shouldn't try to be too formal with DCTAP, so it is good and appropriate that @philbarker spins this as a style guide.

On further reflection about my post above, I do not believe I properly considered the use of "rule" in its AACR sense (a definition of which I could not find) or the use of "business rule" in the discourse about library cataloging. I do not know enough about those usages to judge how they could relate to "rule" as (potentially) used in the DCTAP context.

In primer as: An application profile defines metadata usage for a specific application.