IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create metadatablock for MORU's collection of clinical and public health research data

jggautier opened this issue · comments

Sonia and I are speaking with folks at the Mahidol Oxford Tropical Medicine Research Unit (MORU) who need to publish clinical and public health research data. We're designing a metadatablock to let their team describe the datasets they plan to publish in the collection at https://dataverse.harvard.edu/dataverse/MORU (which is unpublished for now).

MORU needs to start adding datasets by the end of May 2023, so this is when they need the custom metadatablock available for their collection.

There seems to be at least some overlap between the ways in which MORU needs to describe its data and how the current "Life Sciences" metadatablock lets depositors describe data, such as similar field names and similar terms coming from different controlled vocabularies.

So changes to the Life Sciences metadatablock is an option. But I think it's unlikely that we'll be able to research, design, test and implement changes to the Life Sciences metadatablock by the end of May 2023, a month from this writing.

But we can think about how we can use what we learn to improve the "Life Sciences" metadatablock (which might also be a goal of at least one of our upcoming grant-funded work).

It's possible that if later the Life Sciences metadatablock is changed and meets more of MORU's needs, Harvard Dataverse staff could work with Naomi and her colleagues to move the metadata in fields in MORU's custom metadatablock to an improved Life Sciences metadatablock.

More details
We're talking with MORU about how certain metadata fields are exported and made discoverable in different ways, and this will influence the design of the custom metadatablock.

They'll be using the Dataverse APIs to pull metadata of datasets deposited in their collection to make them searchable through another application they're working on, and we're talking with them about how to use the Dataverse APIs to access the metadata in the planned custom metadatablock (such as by getting the metadata from the "dataverse_json" or "oai_ore" exports).

A small number of people will be depositing the datasets. The dataset authors won't be doing the depositing, at least not at first. So testing should involve that smaller number of people, instead of people, like the researchers, who won't be as familiar with Dataverse.

I spoke with admins of MORU's collection and we agreed to have the design of the metadatablock, that is its TSV file, ready by May 15. This would give us about 2 weeks to add the new metadatablock to the repository, so that its ready for use by the end of May.

During the May 8 prioritization meeting, we'll be talking about this.

Note: Needs sizing, prioritize to have it done by end of May

I've emailed the admins of the collection how strict the May 31 deadline is, and if it would be okay if the new fields were available for them to use sometime in June 2023, instead. I expect they'll get back to me by tomorrow, and I'll let @cmbz know then.

I've shared an instance of Dataverse on AWS that has the new metadatablock the admins of the MORU collection so they can review the new fields.

The collection admin let me know that their group plans to launch their collection in a meeting in mid-June, so before then they plan to start adding datasets. And they anticipated that the Dataverse Community Meeting in early June would affect our capacity for adding the new fields on Harvard Dataverse, so they'd like the fields added by May 31 if possible.

  • Note that there are overlaps between the fields in this custom metadata block and the existing life sciences metadata block
  • Sized as 10 because it only requires deployment of already defined metadata TSV

The TSV and .property files are in customMORU.zip.

It's ready to be added to Harvard Dataverse.

@cmbz, could this be added to Harvard Dataverse by May 31?

It's in the global backlog under Sprint Ready. Should I move it to IQSS/dataverse board's Ready for Review column?

OK, the block should be there, but please confirm.

Thanks @landreev. It looks good to me. I'll follow up with the managers of the collection to make sure.

Thanks again @landreev. The new fields look good to the managers of the collection (and I sung your praises to them!).

I'm going to close this issue.

They'd like a change to be made, and I'll open a new issue about that.