omegahat / RCurl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feeding a vector of urls to `url.exists` crashes Rgui and Rstudio

moodymudskipper opened this issue · comments

According to the doc it does expect a scalar input, but I tried it anyway in case it was vectorized, but it crashed repeatedly both in the native console and RStudio :

RCurl::url.exists(c("http://www.google.com","http://www.google.com"))

I believe making it vectorized would be useful, or displaying an helpful error.

Thanks Antoine for the issue. At least the docs say it should be a scalar. Do you actually get a "crash", i.e. a segmentation fault/bus error, or an R error? I don't. I just get back a single scalar value which is FALSE.

The interesting thing about vectorizing this is that ideally it would use multicurl to do the requests in parallel. I'll add that when I get a chance.

It did shut down my Rgui with this system (shut down without any message):

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

And crashed Rstudio (R Session aborted, R encountered a fatal error. The session was terminated) with this system :

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Thanks!

Pushed a version of url.exists() that does a simple vectorization of inputs.

Thanks for the info re. the real crash. Unfortunately, others provide binaries that are not necessarily up to date with the version on github and that also use other versions of libcurl. So it is hard to track down.

Thanks Duncan for reacting so quickly.

A couple suggestions :

  • sapply by default returns a named output so your fix induces a small inconsistency
  • url.exists takes a bit of time so if we use it on a vector of non unique urls it'll be more efficient to do it on unique inputs, maybe something like
    return(unname(sapply(unique(url), url.exists, curl = curl, .header = .header)[url]))

Thanks. Indeed I'm aware of both issues. As I mentioned, the right way to do this is on a unique vector with asynchronous requests via multi-curl. Don't have time to do that right now.