pathname with `?` in windows blocking download
Eluvias opened this issue · comments
Ref: eddelbuettel/td#1
The first url fails because the constructed path name intended for destfile
in download.file()
includes ?
and therefore it is not valid in windows. The issue occurs on the first url because the last path separator is before ?
, so when the basename()
is called inside .prep_input()
the ?
remains. See below.
qry1 <- "https://api.twelvedata.com/time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"
qry2 <- "https://api.twelvedata.com/time_series?symbol=EUR/USD&interval=1min&apikey=demo&source=docs"
RcppSimdJson:::.prep_input(qry1, temp_dir = tempdir(),
compressed_download = FALSE,
verbose = FALSE)
#> Error in download.file(url = .url, destfile = .destfile, method = .method, : cannot open destfile 'C:\Users\Eluvias\AppData\Local\Temp\RtmpCoLe99\time_series?symbol=VTI&interval=1min&apikey=demo&source=docs39644039333f', reason 'Invalid argument'
RcppSimdJson:::.prep_input(qry2, temp_dir = tempdir(),
compressed_download = FALSE,
verbose = FALSE)
#> input
#> 1 C:\\Users\\Eluvias\\AppData\\Local\\Temp\\RtmpCoLe99\\USD&interval=1min&apikey=demo&source=docs396472246445
#> url_prefix file_ext is_from_url is_local_file_url is_remote_file_url
#> 1 https:// TRUE FALSE TRUE
Part of destfile
# basename is called inside .prep_input()
basename(qry1)
#> [1] "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"
basename(qry2)
#> [1] "USD&interval=1min&apikey=demo&source=docs"
Very, very nicely diagnosed.
There must be some sort filename normalizer we should have called here but haven't. We'll add this. As luck will have it, we of course just made a release two days ago :-/ but I will look into squashing this.
I thought that there was base R function with a name around *sanitize*
but I am not seeing it. What does come up is in package fs
which on the margin we'd rather not depend upon. Can you check if a path like this (relative your R session tempdir()
would work?
> fs::path_sanitize("time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")
[1] "time_seriessymbol=VTI&interval=1min&apikey=demo&source=docs"
>
If so then it would appear that we can fix this with a single call to gsub()
.
Or, come to think about it, maybe just call tempfile()
instead.
@knapply thoughts on just relying on tempfile()
for the destination file? It should have all the required logic for passing on every architecture and OS, avoid data races and name clashes and what not. Shall we do that?
Yeah it works:
> fs::path_sanitize("time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")
[1] "time_seriessymbol=VTI&interval=1min&apikey=demo&source=docs"
So we are in fact already using tempfile()
on the set of results, we are "just" lacking this transformation:
temp_files[diagnosis$is_from_url] <- tempfile(
pattern = .drop_file_ext(basename(diagnosis$input[diagnosis$is_from_url]), diagnosis$file_ext[diagnosis$is_from_url]),
tmpdir = normalizePath(temp_dir),
fileext = .fileext[diagnosis$is_from_url]
)
and this is right where we meet your new best friend basename()
...
Actually, sorry, you just validated that you can call fs::path_sanitize()
. Can you please feed the resulting string into dir.create()
or alike as in
> dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")))
>
or even
> mydir <- dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")))
Warning message:
In dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"))) :
'/tmp/RtmpmcvhAh/time_series_symbol=VTI&interval=1min&apikey=demo&source=docs' already exists
>
where I of course got a warning because I did it price. Maybe feeding the string as pattern into tempfile()
would have been smarter...
Apologies:
(mydir <- dir.create(file.path(tempdir(), "time_seriessymbol=VTI&interval=1min&apikey=demo&source=docs")))
[1] TRUE
# with illegal symbol
(mydir <- dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"))))
[1] TRUE
Nice. That is looking promising. I made the simple change and sent it to win-builder and I will paste the URL here once I hear back. You could then try the windows build it produces (if you trust that R Project build machine etc).
Please try the binary from here:
Dear package maintainer,
this notification has been generated automatically.
Your package RcppSimdJson_0.1.4.1.tar.gz has been built (if working) and checked for Windows.
Please check the log files and (if working) the binary package at:
https://win-builder.r-project.org/oSsqYoSHo5kU
The files will be removed after roughly 72 hours.
Installation time in seconds: 208
Check time in seconds: 172
Status: 2 NOTEs
R version 4.0.3 (2020-10-10)
And just because there was a weird note in that log (about .github
which is not in our source) I also sent it to RHub:
See the full build log:
HTML: https://builder.r-hub.io/status/RcppSimdJson_0.1.4.1.tar.gz-363c5233bb4c4244a788ec4af93babd7
Text: https://builder.r-hub.io/status/original/RcppSimdJson_0.1.4.1.tar.gz-363c5233bb4c4244a788ec4af93babd7
Artifacts: https://artifacts.r-hub.io/RcppSimdJson_0.1.4.1.tar.gz-363c5233bb4c4244a788ec4af93babd7
The artifacts link has a binary for you too.
Excellent!!
qry1 <- "https://api.twelvedata.com/time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"
RcppSimdJson::fload(qry1)
#> $meta
#> $meta$symbol
#> [1] "VTI"
#>
#> $meta$interval
#> [1] "1min"
#>
#> $meta$currency
#> [1] "USD"
#>
#> $meta$exchange_timezone
#> [1] "America/New_York"
#>
#> $meta$exchange
#> [1] "NYSE"
#>
#> $meta$type
#> [1] "ETF"
#>
#>
#> $values
#> datetime open high low close volume
#> 1 2021-02-12 15:59:00 207.42000 207.49001 207.37000 207.44000 67274
#> 2 2021-02-12 15:58:00 207.38000 207.41000 207.38000 207.41000 10739
#> 3 2021-02-12 15:57:00 207.35001 207.37000 207.32001 207.37000 14337
#> 4 2021-02-12 15:56:00 207.39500 207.39500 207.32430 207.32430 7079
#> 5 2021-02-12 15:55:00 207.35001 207.35001 207.28999 207.35001 23860
#> 6 2021-02-12 15:54:00 207.37000 207.40199 207.32001 207.35001 36724
#> 7 2021-02-12 15:53:00 207.28000 207.39000 207.28000 207.38000 15897
#> 8 2021-02-12 15:52:00 207.24001 207.31000 207.24001 207.27000 9655
#> 9 2021-02-12 15:51:00 207.19501 207.24001 207.17999 207.23000 13314
#> 10 2021-02-12 15:50:00 207.17000 207.20500 207.12000 207.20500 18703
#> 11 2021-02-12 15:49:00 207.03000 207.08501 207.02400 207.06960 9398
#> 12 2021-02-12 15:48:00 206.97501 207.03000 206.97501 207.01500 5336
#> 13 2021-02-12 15:47:00 206.96660 206.98500 206.96001 206.97501 9431
#> 14 2021-02-12 15:46:00 206.97000 206.98000 206.96201 206.96500 3145
#> 15 2021-02-12 15:45:00 206.98500 206.98981 206.98500 206.98981 4467
#> 16 2021-02-12 15:44:00 207.02000 207.03999 206.98500 206.98500 12226
#> 17 2021-02-12 15:43:00 207.01010 207.01430 207.00000 207.00500 6148
#> 18 2021-02-12 15:42:00 207.01500 207.03999 207.00999 207.01500 5203
#> 19 2021-02-12 15:41:00 206.99969 207.00500 206.97000 207.00000 6203
#> 20 2021-02-12 15:40:00 206.97501 207.03000 206.97501 207.01340 23694
#> 21 2021-02-12 15:39:00 206.97360 206.99001 206.95090 206.97501 5859
#> 22 2021-02-12 15:38:00 206.96899 206.99001 206.96500 206.98260 18887
#> 23 2021-02-12 15:37:00 206.89819 206.97501 206.88000 206.96001 6236
#> 24 2021-02-12 15:36:00 206.88000 206.92999 206.88000 206.92999 6444
#> 25 2021-02-12 15:35:00 206.78999 206.88000 206.78999 206.88000 5997
#> 26 2021-02-12 15:34:00 206.76500 206.78999 206.73640 206.78999 10608
#> 27 2021-02-12 15:33:00 206.77499 206.78999 206.75999 206.77000 6953
#> 28 2021-02-12 15:32:00 206.73500 206.78210 206.73500 206.78210 6500
#> 29 2021-02-12 15:31:00 206.73199 206.75000 206.73000 206.73000 6473
#> 30 2021-02-12 15:30:00 206.72000 206.73689 206.72000 206.73689 1131
#>
#> $status
#> [1] "ok"
packageVersion("RcppSimdJson")
#> [1] '0.1.4.1'
Effing wicked. Thank you so much for the bug report, and for so promptly putting the finger on the wound.
I will close the ticker over at td
as that package is "innocent" but mark this in the README, and will upload a fixed RcppSimpJson
real soon.
Likewise and always my pleasure.