Error in prepend_doi(filedoi) : argument "filedoi" is missing, with no default
sjkiss opened this issue · comments
I would like to download the stata tab file here but am getting this error. Running dataverse 0.3.7
Note that I'm pretty sure I've set the server and the key properly.
Error in prepend_doi(filedoi) :
argument "filedoi" is missing, with no default
## code goes here
library(dataverse)
get_dataframe_by_doi(filename="2019 Canadian Election Study - Online Survey v1.0.tab",
original=T, .f=haven::read_dta,dataset="doi:10.7910/DVN/DUS88V",
server="dataverse.harvard.edu")
## session info for your system
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] dataverse_0.3.7 labelled_2.8.0 cesdata_0.1.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 knitr_1.31 magrittr_2.0.1
[4] hms_1.0.0 tidyselect_1.1.0 R6_2.5.0
[7] rlang_0.4.10 fansi_0.4.2 stringr_1.4.0
[10] dplyr_1.0.5 tools_4.0.4 xfun_0.22
[13] utf8_1.1.4 DBI_1.1.1 htmltools_0.5.1.1
[16] ellipsis_0.3.1 assertthat_0.2.1 yaml_2.2.1
[19] digest_0.6.27 tibble_3.1.0 lifecycle_1.0.0
[22] crayon_1.4.1 tidyr_1.1.3 purrr_0.3.4
[25] vctrs_0.3.6 glue_1.4.2 evaluate_0.14
[28] haven_2.3.1 rmarkdown_2.7 stringi_1.5.3
[31] compiler_4.0.4 pillar_1.5.1 forcats_0.5.1
[34] generics_0.1.0 pkgconfig_2.0.3
@sjkiss, thanks for filing such a thorough issue. I think you can get your desired output with two changes.
- change
dataverse::get_dataframe_by_doi()
todataverse::get_dataframe_by_name()
- change file extension from "tab" (the file created by Dataverse during the ingestion) to "dta" (the original Stata file extension), because you specified
original = TRUE
@pdurbin, this dataset doesn't have a *.tab file, unlike a test dataset with a Stata file. I see this dataset was uploaded May 2020 --did the server software start converting Stata files between then and Dec 2020?
dataverse::get_dataframe_by_name(
filename = "2019 Canadian Election Study - Online Survey v1.0.dta",
original = TRUE,
.f = haven::read_dta,
dataset = "doi:10.7910/DVN/DUS88V",
server = "dataverse.harvard.edu"
)
output:
# A tibble: 37,822 x 620
cps19_StartDate cps19_EndDate cps19_ResponseId cps19_consent
<dttm> <dttm> <chr> <dbl+lbl>
1 2019-09-13 08:09:44 2019-09-13 08:36:19 R_1OpYXEFGzHRUpjM 1 [I consent to par~
2 2019-09-13 08:39:09 2019-09-13 08:57:06 R_2qdrL3J618rxYW0 1 [I consent to par~
3 2019-09-13 10:01:19 2019-09-13 10:27:29 R_USWDAPcQEQiMmNb 1 [I consent to par~
4 2019-09-13 10:05:37 2019-09-13 10:50:53 R_3IQaeDXy0tBzEry 1 [I consent to par~
5 2019-09-13 10:05:52 2019-09-13 10:32:53 R_27WeMQ1asip2cMD 1 [I consent to par~
6 2019-09-13 10:10:20 2019-09-13 10:29:45 R_3LiGZcCWJEcWV4P 1 [I consent to par~
7 2019-09-13 10:14:47 2019-09-13 10:32:32 R_1Iu8R1UlYzVMycz 1 [I consent to par~
8 2019-09-13 10:15:39 2019-09-13 10:30:59 R_2EcS26hqrcVYlab 1 [I consent to par~
9 2019-09-13 10:15:48 2019-09-13 10:37:45 R_3yrt44wqQ1d4VRn 1 [I consent to par~
10 2019-09-13 10:16:08 2019-09-13 10:40:14 R_10OBmXJyvn8feYQ 1 [I consent to par~
# ... with 37,812 more rows, and 616 more variables:
# cps19_citizenship <dbl+lbl>, cps19_yob <dbl+lbl>,
# cps19_yob_2001_age <dbl+lbl>, cps19_gender <dbl+lbl>,
# cps19_province <dbl+lbl>, cps19_education <dbl+lbl>, cp
...
Dataverse source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DUS88V
Yes, the main fix is using get_dataframe_by_name
since you are specifying a name, as Will wrote. It seems like the datafile in question does have its own DOI (https://doi.org/10.7910/DVN/DUS88V/RZFNOV) so get_dataframe_by_doi
should work on that as filedoi (with no filename argument) too.
The lack of ingest is something I've spotted recently, especially for large files. If it is something systematic, it is probably a different issue.
Here's @kuriwaki's version using dataverse::get_dataframe_by_doi()
. Notice two differences
- the function name, and
- the
filedoi
parameter is used instead offilename
(and the datafile-specific "RZFNOV" suffix is added to the doi).
dataverse::get_dataframe_by_doi(
filedoi = "https://doi.org/10.7910/DVN/DUS88V/RZFNOV",
original = TRUE,
.f = haven::read_dta,
server = "dataverse.harvard.edu"
)
output:
# A tibble: 37,822 x 620
cps19_StartDate cps19_EndDate cps19_ResponseId cps19_consent
<dttm> <dttm> <chr> <dbl+lbl>
1 2019-09-13 08:09:44 2019-09-13 08:36:19 R_1OpYXEFGzHRUpjM 1 [I consent to par~
2 2019-09-13 08:39:09 2019-09-13 08:57:06 R_2qdrL3J618rxYW0 1 [I consent to par~
3 2019-09-13 10:01:19 2019-09-13 10:27:29 R_USWDAPcQEQiMmNb 1 [I consent to par~
4 2019-09-13 10:05:37 2019-09-13 10:50:53 R_3IQaeDXy0tBzEry 1 [I consent to par~
5 2019-09-13 10:05:52 2019-09-13 10:32:53 R_27WeMQ1asip2cMD 1 [I consent to par~
6 2019-09-13 10:10:20 2019-09-13 10:29:45 R_3LiGZcCWJEcWV4P 1 [I consent to par~
7 2019-09-13 10:14:47 2019-09-13 10:32:32 R_1Iu8R1UlYzVMycz 1 [I consent to par~
8 2019-09-13 10:15:39 2019-09-13 10:30:59 R_2EcS26hqrcVYlab 1 [I consent to par~
9 2019-09-13 10:15:48 2019-09-13 10:37:45 R_3yrt44wqQ1d4VRn 1 [I consent to par~
10 2019-09-13 10:16:08 2019-09-13 10:40:14 R_10OBmXJyvn8feYQ 1 [I consent to par~
# ... with 37,812 more rows, and 616 more variables:
# cps19_citizenship <dbl+lbl>, cps19_yob <dbl+lbl>,
# cps19_yob_2001_age <dbl+lbl>, cps19_gender <dbl+lbl>,
# cps19_province <dbl+lbl>, cps19_education <dbl+lbl>, cps19_demsat <dbl+lbl>,
# cps19_imp_iss <chr>, cps19_imp_iss_party <dbl+lbl>,
...
@pdurbin, this dataset doesn't have a *.tab file, unlike a test dataset with a Stata file. I see this dataset was uploaded May 2020 --did the server software start converting Stata files between then and Dec 2020?
Like @kuriwaki said, it's probably because the Stata file is on the large side at 184 MB. Dataverse can be configured with different thresholds for Excel vs. CSV vs. Stata vs. SPSS vs. RData. Basically, if the file is too big, Dataverse won't even try to ingest it. I'm not sure what these thresholds are for Harvard Dataverse. The setting is called :TabularIngestSizeLimit if you're interested: https://guides.dataverse.org/en/5.3/installation/config.html#tabularingestsizelimit
It's also possible that the Stata file failed ingest but you can only tell if you're logged in and have access.
A final possibility is that it's a newer version of Stata than Dataverse supports, which is Stata 15: https://guides.dataverse.org/en/5.3/user/tabulardataingest/supportedformats.html (I guess this would also be a failure).
@sjkiss, I believe this issue has addressed your main concern and provided answers to the auxiliary questions that popped up. Please reopen the issue if not.
Dataverse can be configured with different thresholds for Excel vs. CSV vs. Stata vs. SPSS vs. RData. Basically, if the file is too big, Dataverse won't even try to ingest it. I'm not sure what these thresholds are for Harvard Dataverse.
I just wanted to point out that after the fact even large files can be ingested as needed. Here's an example: IQSS/dataverse.harvard.edu#103
Thanks all!