IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement Dropdown Selector for NIH Controlled Vocabulary in Keywords

Saixel opened this issue · comments

Background

As part of our ongoing collaboration with the CAFE project team, we have identified a need to refine the user experience in selecting controlled vocabulary terms for dataset keywords. This initiative aims to align the Dataverse metadata input process with standard vocabularies and enhance data discoverability and consistency.

Feature Request

Implement a selector (dropdown, box options, widget, etc) to allow users to select and add terms from the NIH controlled vocabulary glossary as keywords.

Current State:

  • The 'Keyword' metadata section allows for manual text entry, with the option to add additional input fields.
  • There is a textual prompt directing users to the NIH glossary website for keyword selection.

Desired Functionality:

  • Replace the manual entry system with a dropdown selector or a similar UI component.
  • Dynamically populate options within the selector based on the NIH controlled vocabulary glossary.

Justification

The CAFE project team requires a more standardized and error-proof method for keyword selection to ensure metadata quality and consistency. This enhancement will support users in accurately tagging datasets, thus facilitating better data curation and searchability.

Implementation Considerations

  • Explore the use of "Controlled Vocabulary URL" for dynamic term loading from the NIH glossary.
  • Consider the integration of a resource that has been prepared with each keyword, a description, and a URL, possibly using this as a CSV or similar format to load selector options.
  • The selector UI must be intuitive and should support multiple term additions as per dataset requirements.
  • Backend integration must ensure correct storage and handling of the selected vocabulary terms.

Additional Context

This request is driven by user feedback and the project's commitment to improving data quality and curation practices within the CAFE project's use of Dataverse. We have already compiled a comprehensive list, which includes each keyword, its description, and the associated URL, ready to be utilized for the selector feature.

Related:

However, I checked with @Saixel and he plans to implement this using a custom metadata block for the CAFE project rather than attempting to modify the keyword field in the citation block (which is what the issue above is about).

He said there are almost 300 controlled vocabulary values.

First, this should be moved to the harvard dataverse repo, as it should not require any code in the core.

Second, we're wondering about this:
"Dynamically populate options within the selector based on the NIH controlled vocabulary glossary"

Is the idea to haver these values read from an existing API? If so we would use the external CV functionality and the best next step woiuld be a spike to use this API and make sure there are not any unexpected behavior. (that spike would likely be a size 10, for someone who already has experience with the external CV functionality)
If not, and it's just using our external CV functionality, then all that needs to be done is add the values to the appropriate tsv file, can be sized as a 3.

Related:

However, I checked with @Saixel and he plans to implement this using a custom metadata block for the CAFE project rather than attempting to modify the keyword field in the citation block (which is what the issue above is about).

He said there are almost 300 controlled vocabulary values.

@pdurbin Thanks for pointing out the related issue. My initial approach was to use a custom metadata block to avoid changing the current keyword block structure. However, I see in the comment in IQSS/dataverse#10288 that a similar case is suggested by implementing an autocomplete function. Our goal is to present a list of options for keyword selection from the prepared terms in a CSV. So either through a dropdown or autocomplete, either option could be a viable solution. If it's okay with you, we can dig deeper into this topic as we work on this implementation.

First, this should be moved to the harvard dataverse repo, as it should not require any code in the core.

Second, we're wondering about this: "Dynamically populate options within the selector based on the NIH controlled vocabulary glossary"

Is the idea to haver these values read from an existing API? If so we would use the external CV functionality and the best next step woiuld be a spike to use this API and make sure there are not any unexpected behavior. (that spike would likely be a size 10, for someone who already has experience with the external CV functionality) If not, and it's just using our external CV functionality, then all that needs to be done is add the values to the appropriate tsv file, can be sized as a 3.

@scolapasta The issue has been moved to the Harvard Dataverse repo as per your guidance (thanks for pointing this out). Regarding the "Dynamically populate options within the selector based on the NIH controlled vocabulary glossary" feature, I'd like to clarify that we don't have an API. Instead, we have a CSV with a list of almost 300 terms. If we can use the external CV functionality you mentioned for this purpose, I would appreciate any documentation or pointers to existing implementations to explore and test this further.

I'd like to clarify that we don't have an API. Instead, we have a CSV with a list of almost 300 terms. If we can use the external CV functionality you mentioned for this purpose

I would recommend playing around with the configuring Author Affiliation to look up from ROR. For config advice, please see IQSS/dataverse#10331 (comment)

That said, this feature depends on an external API (like the ROR API). So you'd need to build and host that API somehow.

It might be easier to use the database and put the 300 values in a controlled vocabulary. But if you have a plan for how to build an API and where to host it, it should be do-able. 😄