IQSS / dataverse-client-r

R Client for Dataverse Repositories

Home Page:https://iqss.github.io/dataverse-client-r

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support `versions` argument in get_file/get_dataframe

kuriwaki opened this issue · comments

For datasets that periodically update (e.g. this is on V6), it would be good for the get_* functions to have an option to specify the version number.

Looks like there is an API for that
https://guides.dataverse.org/en/latest/api/dataaccess.html#download-by-dataset-by-version

However, see #27 for an error in dataset_versions.

We already seem to have this, it's just not documented and was passed through .... But files that no longer exist in :latest are not caught. Should be ready after fixing that.

library(dataverse)
library(readr)
packageVersion("dataverse")
#> [1] '0.3.10'

# setup
doi <- "doi:10.70122/FK2/PPIAXE"
Sys.setenv(DATAVERSE_SERVER = "demo.dataverse.org")

fun <- function(x) read_tsv(x, col_types = cols())


# Expected Success
d1 <- get_dataframe_by_name("nlsw88.tab", doi, .f = fun, version = 1)
d2 <- get_dataframe_by_name("nlsw88.tab", doi, .f = fun, version = 1.1)

# Expected ERROR - version 5 does not exist
d3 <- get_dataframe_by_name("nlsw88.tab", doi, .f = fun, version = 99)
#> Error in dataset_files(prepend_doi(x), key = key, server = server, ...): Not Found (HTTP 404). Failed to Dataset version 99 of dataset 1734015 not found.


# ERROR to fix
# A filename that no longer exists on latest version
# https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GDF6Z0&version=4.0
cc <- get_dataframe_by_name(
  filename = "CCES16_Common_OUTPUT_Jul2017_VV.tab",
  version = 4,
  dataset = "10.7910/DVN/GDF6Z0",
  server = "dataverse.harvard.edu",
  original = TRUE,
  .f = haven::read_dta)
#> Error in get_fileid.character(x = dataset, file = filename, ...): File not found

Created on 2022-01-12 by the reprex package (v2.0.1)