adiwg / mdTranslator

Metadata translation tool built using Ruby

Home Page:https://www.adiwg.org/mdTranslator/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sbJSON provenance object should map to metadataInfo

dkarthur opened this issue · comments

The sbJSON reader is currently mapping the sbJSON provenance object to resource citation object of the internal translator data format.

Dates and contacts associated with the metadata record itself and not the ScienceBase item being referenced are being translated incorrectly. Dates are being mapped from the sbJSON “provenance” to mdJson “resourceInfo,” while contacts are not being translated at all. Addressing this issue is a critical need for NGGDPP and ReSciColl developers in order to provide appropriate metrics for USGS and external ReSciColl users and stakeholders.

sbJSON Example:

"provenance": {
		"dateCreated": "2023-01-10T17:39:42Z",
		"lastUpdated": "2023-01-10T19:41:42Z",
		"lastUpdatedBy": "vcrystal@usgs.gov",
		"createdBy": "vcrystal@usgs.gov"
}

Current mdJSON translation:

"resourceInfo": {
	"citation": {
		"title": "Carlsbad Cores Collection",
		"date": [
			{
				"date": "2023-01-10T17:39:42+00:00",
				"dateType": "creation"
			},
			{
				"date": "2023-01-10T19:41:42+00:00",
				"dateType": "lastUpdate"
			},
			{
				"date": "2023-01-10",
				"dateType": "creation",
				"description": "Creation"
			}
		],
		"responsibleParty": [
			{
				"role": "owner",
				"party": [
					{
						"contactId": "40fff240-e50e-49a7-9e6a-259326e5e866"
					}
				]
			}
		]
	}
...

Desired translation:

metadataInfo > metadataDate

  • sbJSON “dateCreated” -> mdJSON dateType = "creation"
  • sbJSON "lastUpdated" -> mdJSON dateType = "lastUpdate"

metadataInfo > metadataContact

  • sbJSON "createdBy" -> mdJSON role = "author"
  • sbJSON "lastUpdatedBy" -> mdJSON role = "editor"

The “createdBy” and “lastUpdatedBy” properties in the sbJSON “provenance” section are currently not found anywhere in the mdJSON output from mdTranslator. They should be mapped to “metadatainfo”: “metadataContact” with “role” of “author” (or "curator") and “editor” accordingly.

@dkarthur what do you mean by "contacts are not being translated at all"? The code snippet you provided seems to display a responsibleParty.

Regarding "...with “role” of “author” (or "curator")" I would probably recommend "author" as this is an ISO code described as "party who authored the resource". "Curator" is an ADIwg extended code defined as "party who serves as curator for specimens deposited in a repository". There is also an "originator" (party that created the resource); which might be applicable if the metadata was "authored" by one party but then uploaded to the system by a second party?

@hmaier-fws Perhaps I should've phrased it as: Contacts from sbJSON provenance object are not being mapped to mdJSON. When not using mdEditor, I don't know how to resolve the mdJSON responsibleParty code. It doesn't appear to map to the ScienceBase user who created the metadata record there, and it's that information that doesn't appear to be coming through the translator at all.

Also, to be sure I understand your comment on the second part, when you refer to "resource," are you referring to whatever it is to which the metadata refers, not the metadata record itself, or are you referring to the metadata?

The module_provenance.rb only handles the "dateCreated" and "lastUpdated" fields. The "lastUpdatedBy" and "createdBy" fields are dropped by the sbJson reader.

In addition to the above, the sbJson "dates" field is also added to the resourceInfo section in module_date.rb. The result is that there can be 2 creation dates in the resourceInfo section

mdJson
image

sbJson
Screenshot from 2023-04-03 11-19-32

Screenshot from 2023-04-03 11-20-11

A snapshot from ScienceBase's documentation.

Provenance:

Datatype: Provenance object
The ScienceBase Provenance attribute is an open text field that is used to describe the origin of an item, especially in terms of how the item came to be introduced to ScienceBase. It can be used to describe the full provenance of some form of data that may have been through a number of derivations.​​​​​​​

provenance Object
annotation
Datatype: String
The text of the provenance.

dataSource
Datatype: String
Where the item came from. If this item was created by a person in ScienceBase it will be "Input Directly". If it was harvested from an external source this will show that instead.

dateCreated
Datatype: DateTime
The date and time the item was created.

createdBy
Datatype: String
The person or organization who created the item.

lastUpdated
Datatype: DateTime
The date and time the item was last updated.

lastUpdatedBy
Datatype: String
The last person or organization to update the item.

"provenance":
{
"annotation":"Provenance1",
"dataSouce":"Input directly",
"dateCreated":"2015-11-09T19:02:45Z",
"lastUpdated":"2015-11-09T19:02:45Ze",
"lastUpdatedBy":"abc@usgs.gov",
"createdBy":"abc@usgs.gov",
"fileProcess": ???,
"linkProcess": ???
}​​​​​​​

commented

Verified createdBy and lastUpdatedBy not being populated in ScienceBase, and not a factor of sbJSON-mdJSON translation. In addition, dataSource is not populated either. How, when or whether it is currently used by ScienceBase is unknown. @dkarthur will run use case tests to help us understand how and when provenance is created and updated as follows:

  1. Created using the ReSciCol Dashboard app
  2. Created using ScienceBase
  3. Created using mdEditor

For each create example, test update in mdEditor and re-publish to ScienceBase (update item) to help us determine if update processes have different logic than create processes regarding writes to provenance.

Test update in ScienceBase regardless of create method, update in mdEditor and re-publish to ScienceBase.

Request to ScienceBase team:

  1. ScienceBase API scripts will need to be updated to populate createdBy, updatedBy
  2. Relative to test findings, ScienceBase API scripts may need additional changes

Agreement with @dkarthur to:

  1. map createdBy and lastUpdatedBy to: [schema{ } > metadata{ } > metadataInfo{ } > metadataDate[ ] > object{ } > description]
  2. Accept proposal to remap sbJSON>provenance dateCreated, dateUpdated to metadataDate>date, with "creation", and "lastUpdate" dateType as is appropriate
commented

I think we have agreed on a different proposal. Can this issue be closed?