eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pathname with `?` in windows blocking download

Eluvias opened this issue · comments

Ref: eddelbuettel/td#1

The first url fails because the constructed path name intended for destfile in download.file()
includes ? and therefore it is not valid in windows. The issue occurs on the first url because the last path separator is before ? , so when the basename() is called inside .prep_input() the ? remains. See below.

qry1 <- "https://api.twelvedata.com/time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"
qry2 <- "https://api.twelvedata.com/time_series?symbol=EUR/USD&interval=1min&apikey=demo&source=docs"

RcppSimdJson:::.prep_input(qry1, temp_dir = tempdir(),
                           compressed_download = FALSE,
                           verbose = FALSE)
#> Error in download.file(url = .url, destfile = .destfile, method = .method, : cannot open destfile 'C:\Users\Eluvias\AppData\Local\Temp\RtmpCoLe99\time_series?symbol=VTI&interval=1min&apikey=demo&source=docs39644039333f', reason 'Invalid argument'

RcppSimdJson:::.prep_input(qry2, temp_dir = tempdir(),
                           compressed_download = FALSE,
                           verbose = FALSE)
#>                                                                                                             input
#> 1 C:\\Users\\Eluvias\\AppData\\Local\\Temp\\RtmpCoLe99\\USD&interval=1min&apikey=demo&source=docs396472246445
#>   url_prefix file_ext is_from_url is_local_file_url is_remote_file_url
#> 1   https://                 TRUE             FALSE               TRUE

Part of destfile

# basename is called inside .prep_input()
basename(qry1)
#> [1] "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"

basename(qry2)
#> [1] "USD&interval=1min&apikey=demo&source=docs"

Very, very nicely diagnosed.

There must be some sort filename normalizer we should have called here but haven't. We'll add this. As luck will have it, we of course just made a release two days ago :-/ but I will look into squashing this.

I thought that there was base R function with a name around *sanitize* but I am not seeing it. What does come up is in package fs which on the margin we'd rather not depend upon. Can you check if a path like this (relative your R session tempdir() would work?

> fs::path_sanitize("time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")
[1] "time_seriessymbol=VTI&interval=1min&apikey=demo&source=docs"
> 

If so then it would appear that we can fix this with a single call to gsub().

Or, come to think about it, maybe just call tempfile() instead.

@knapply thoughts on just relying on tempfile() for the destination file? It should have all the required logic for passing on every architecture and OS, avoid data races and name clashes and what not. Shall we do that?

Yeah it works:

> fs::path_sanitize("time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")
[1] "time_seriessymbol=VTI&interval=1min&apikey=demo&source=docs"

So we are in fact already using tempfile() on the set of results, we are "just" lacking this transformation:

        temp_files[diagnosis$is_from_url] <- tempfile(
            pattern = .drop_file_ext(basename(diagnosis$input[diagnosis$is_from_url]), diagnosis$file_ext[diagnosis$is_from_url]),
            tmpdir = normalizePath(temp_dir),
            fileext = .fileext[diagnosis$is_from_url]
        )

and this is right where we meet your new best friend basename() ...

Actually, sorry, you just validated that you can call fs::path_sanitize(). Can you please feed the resulting string into dir.create() or alike as in

> dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")))
> 

or even

> mydir <- dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs")))
Warning message:
In dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"))) :
  '/tmp/RtmpmcvhAh/time_series_symbol=VTI&interval=1min&apikey=demo&source=docs' already exists
> 

where I of course got a warning because I did it price. Maybe feeding the string as pattern into tempfile() would have been smarter...

Apologies:

 (mydir <- dir.create(file.path(tempdir(),  "time_seriessymbol=VTI&interval=1min&apikey=demo&source=docs")))
[1] TRUE

# with illegal symbol
(mydir <- dir.create(file.path(tempdir(), gsub("[?]", "_", "time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"))))
[1] TRUE

Nice. That is looking promising. I made the simple change and sent it to win-builder and I will paste the URL here once I hear back. You could then try the windows build it produces (if you trust that R Project build machine etc).

Please try the binary from here:

Dear package maintainer,

this notification has been generated automatically.
Your package RcppSimdJson_0.1.4.1.tar.gz has been built (if working) and checked for Windows.
Please check the log files and (if working) the binary package at:
https://win-builder.r-project.org/oSsqYoSHo5kU
The files will be removed after roughly 72 hours.
Installation time in seconds: 208
Check time in seconds: 172
Status: 2 NOTEs
R version 4.0.3 (2020-10-10)

And just because there was a weird note in that log (about .github which is not in our source) I also sent it to RHub:

See the full build log:
HTML: https://builder.r-hub.io/status/RcppSimdJson_0.1.4.1.tar.gz-363c5233bb4c4244a788ec4af93babd7
Text: https://builder.r-hub.io/status/original/RcppSimdJson_0.1.4.1.tar.gz-363c5233bb4c4244a788ec4af93babd7
Artifacts: https://artifacts.r-hub.io/RcppSimdJson_0.1.4.1.tar.gz-363c5233bb4c4244a788ec4af93babd7

The artifacts link has a binary for you too.

Excellent!!

qry1 <- "https://api.twelvedata.com/time_series?symbol=VTI&interval=1min&apikey=demo&source=docs"

RcppSimdJson::fload(qry1)
#> $meta
#> $meta$symbol
#> [1] "VTI"
#> 
#> $meta$interval
#> [1] "1min"
#> 
#> $meta$currency
#> [1] "USD"
#> 
#> $meta$exchange_timezone
#> [1] "America/New_York"
#> 
#> $meta$exchange
#> [1] "NYSE"
#> 
#> $meta$type
#> [1] "ETF"
#> 
#> 
#> $values
#>               datetime      open      high       low     close volume
#> 1  2021-02-12 15:59:00 207.42000 207.49001 207.37000 207.44000  67274
#> 2  2021-02-12 15:58:00 207.38000 207.41000 207.38000 207.41000  10739
#> 3  2021-02-12 15:57:00 207.35001 207.37000 207.32001 207.37000  14337
#> 4  2021-02-12 15:56:00 207.39500 207.39500 207.32430 207.32430   7079
#> 5  2021-02-12 15:55:00 207.35001 207.35001 207.28999 207.35001  23860
#> 6  2021-02-12 15:54:00 207.37000 207.40199 207.32001 207.35001  36724
#> 7  2021-02-12 15:53:00 207.28000 207.39000 207.28000 207.38000  15897
#> 8  2021-02-12 15:52:00 207.24001 207.31000 207.24001 207.27000   9655
#> 9  2021-02-12 15:51:00 207.19501 207.24001 207.17999 207.23000  13314
#> 10 2021-02-12 15:50:00 207.17000 207.20500 207.12000 207.20500  18703
#> 11 2021-02-12 15:49:00 207.03000 207.08501 207.02400 207.06960   9398
#> 12 2021-02-12 15:48:00 206.97501 207.03000 206.97501 207.01500   5336
#> 13 2021-02-12 15:47:00 206.96660 206.98500 206.96001 206.97501   9431
#> 14 2021-02-12 15:46:00 206.97000 206.98000 206.96201 206.96500   3145
#> 15 2021-02-12 15:45:00 206.98500 206.98981 206.98500 206.98981   4467
#> 16 2021-02-12 15:44:00 207.02000 207.03999 206.98500 206.98500  12226
#> 17 2021-02-12 15:43:00 207.01010 207.01430 207.00000 207.00500   6148
#> 18 2021-02-12 15:42:00 207.01500 207.03999 207.00999 207.01500   5203
#> 19 2021-02-12 15:41:00 206.99969 207.00500 206.97000 207.00000   6203
#> 20 2021-02-12 15:40:00 206.97501 207.03000 206.97501 207.01340  23694
#> 21 2021-02-12 15:39:00 206.97360 206.99001 206.95090 206.97501   5859
#> 22 2021-02-12 15:38:00 206.96899 206.99001 206.96500 206.98260  18887
#> 23 2021-02-12 15:37:00 206.89819 206.97501 206.88000 206.96001   6236
#> 24 2021-02-12 15:36:00 206.88000 206.92999 206.88000 206.92999   6444
#> 25 2021-02-12 15:35:00 206.78999 206.88000 206.78999 206.88000   5997
#> 26 2021-02-12 15:34:00 206.76500 206.78999 206.73640 206.78999  10608
#> 27 2021-02-12 15:33:00 206.77499 206.78999 206.75999 206.77000   6953
#> 28 2021-02-12 15:32:00 206.73500 206.78210 206.73500 206.78210   6500
#> 29 2021-02-12 15:31:00 206.73199 206.75000 206.73000 206.73000   6473
#> 30 2021-02-12 15:30:00 206.72000 206.73689 206.72000 206.73689   1131
#> 
#> $status
#> [1] "ok"

packageVersion("RcppSimdJson")
#> [1] '0.1.4.1'

Effing wicked. Thank you so much for the bug report, and for so promptly putting the finger on the wound.

I will close the ticker over at td as that package is "innocent" but mark this in the README, and will upload a fixed RcppSimpJson real soon.

Likewise and always my pleasure.