IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spike: Investigate performance problems involving large data uploads to Harvard Dataverse

cmbz opened this issue · comments

Background

@sbarbosadataverse has noted that Harvard Dataverse users have had difficulty uploading large data files (in the 100s of GB) to the repository using the UI; these users have reached out to her to upload files on their behalf. She has had success using some WiFi connections but not others to upload large files. In particular, Harvard WiFi fails repeatedly.

Goals

  • Investigate and document why uploads of large files (e.g., 100s GB) to Harvard Dataverse hang or fail
    • Identify and document self-service upload limits using the UI and WiFi vs. wired connections
  • Investigate and document alternative, mediated and self-service approaches to uploading large files (e.g., mediated: rsync, self-service: API/scripts)

2023/11/06: Added issue to Global Backlog as outcome of discussion during prioritization meeting. Adding to 6.2 proposals due to NIH GREI relevance.