IQSS / dataverse-client-r

R Client for Dataverse Repositories

Home Page:https://iqss.github.io/dataverse-client-r

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve doc on how to read objects without object assignment

kuriwaki opened this issue · comments

RData files cannot be read in as an object, but instead are simply released on to the user environment. I think we should all be switching to Rds (see IQSS/dataverse#7249) but nonetheless, some files on Dataverse are uploaded as .RData.

It turns out there are two ways to load this. One is the old way to write the binary file and re-read it with a different function. Another is to create a mini environment within a function, as I found on Stack Overflow. See both in the reprex below. I get identical objects.

We should update the doc with an example.

h/t @jonrobinson2

library(dataverse)
library(fs)


# Algara dataset
# https://dataverse.harvard.edu/file.xhtml?fileId=5028532&version=1.0

# 1. writing and saving as binary works
as_binary <- get_file_by_id(file = 5028532, server = "dataverse.harvard.edu")

temp <- tempdir()
writeBin(as_binary, path(temp, "county.RData"))
load(path(temp, "county.RData"))

str(pres_elections_release)
#> 'data.frame':    113756 obs. of  20 variables:
#>  $ election_year                        : num  1868 1872 1876 1880 1884 ...
#>  $ fips                                 : chr  "01001" "01001" "01001" "01001" ...
#>  $ county_name                          : chr  "AUTAUGA" "AUTAUGA" "AUTAUGA" "AUTAUGA" ...
#>  $ state                                : chr  "AL" "AL" "AL" "AL" ...
#>  $ sfips                                : chr  "01" "01" "01" "01" ...
#>  $ office                               : chr  "PRES" "PRES" "PRES" "PRES" ...
#>  $ election_type                        : chr  "G" "G" "G" "G" ...
#>  $ seat_status                          : chr  "Open Seat" "Republican President Re-election" "Open Seat" "Open Seat" ...
#>  $ democratic_raw_votes                 : num  851 669 804 978 911 ...
#>  $ dem_nominee                          : chr  "Horatio Seymour" "Horace Greeley" "Samuel J. Tilden" "Winfield Scott Hancock" ...
#>  $ republican_raw_votes                 : num  1505 1593 1576 974 877 ...
#>  $ rep_nominee                          : chr  "Ulysses S. Grant" "Ulysses S. Grant" "Rutherford B. Hayes" "James A. Garfield" ...
#>  $ pres_raw_county_vote_totals_two_party: num  2356 2262 2380 1952 1788 ...
#>  $ raw_county_vote_totals               : num  2356 2262 2380 1967 1789 ...
#>  $ county_first_date                    : Date, format: "1818-11-21" "1818-11-21" ...
#>  $ county_end_date                      : Date, format: NA NA ...
#>  $ state_admission_date                 : chr  "1819-12-14" "1819-12-14" "1819-12-14" "1819-12-14" ...
#>  $ complete_county_cases                : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ original_county_name                 : chr  NA NA NA NA ...
#>  $ original_name_end_date               : Date, format: NA NA ...


# 2. how about directly into R? This is a Rdata file, which we often read by load().

# via: https://stackoverflow.com/questions/34925668/r-assign-content-from-rda-object-with-load
load_object <- function(file) {
  tmp <- new.env()
  load(file = file, envir = tmp)
  tmp[[ls(tmp)[1]]]
}


as_rda <- get_dataframe_by_id(file = 5028532, 
                              server = "dataverse.harvard.edu", 
                              .f = load_object, 
                              original = TRUE)

identical(as_rda, pres_elections_release)
#> [1] TRUE

Created on 2021-09-16 by the reprex package (v2.0.1)

@Danny-dK's proposal is more concise:

get_dataframe_by_doi(
  filedoi = "10.70122/FK2/PPIAXE/X2FC5V",
  server = "demo.dataverse.org",
  original = TRUE,
  .f = function(x) load(x, envir = .GlobalEnv))

I have made this change in dev: f33e578

Implemented in 0.3.14