omegahat / RCurl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

embedded nul in string

ChristelSwift opened this issue · comments

i'm trying to download an excel file from an sftp site. The excel file looks fine to me, but i'd have to skip a couple of lines to import it as a dataset. When i run:

data <- getURL(
  url = url, 
  userpwd = userpwd, 
  verbose = TRUE
  ) 

I get

* SSH authentication methods available: password
* Initialized password authentication
* Authentication complete
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
  embedded nul in string: 'PK\003\004\024'

Any idea what i can do to fix this?

Hi Christel
I'd try

tmp <- getURLContent(url = url, userpwd = userpwd, verbose = TRUE, binary = TRUE)
data = rawToChar(tmp)

Please let us know if that solves the problem.

thank you. I tried but unfortunately still got the same error:

> tmp <- getURLContent(url = url, userpwd = userpwd, verbose = TRUE, binary = TRUE)
*   Trying xxxxxx...
* TCP_NODELAY set
* Connected to sftp.xxxx.com (xxxx) port 22 (#0)
* SSH MD5 fingerprint: xxxxx
* SSH authentication methods available: password
* Initialized password authentication
* Authentication complete
Error in curlPerform(url = url, curl = curl, .opts = .opts) : 
  embedded nul in string: 'PK\003\004\024'

Can you please try

data = getBinaryURL(url = url, userpwd = userpwd, verbose = TRUE)

and hopefully that will work or give a different problem.

it has imported but it's in binary format so it looks nothing like the original excel...

* SSH authentication methods available: password
* Initialized password authentication
* Authentication complete
* Failed to close libssh2 file: -31 SFTP Protocol Error
* Connection #0 to host sftp.grouptechedge.com left intact
> data
   [1] 50 4b 03 04 14 00 00 00 08 00 5b 8b 7e 57 1c 07 04 7d 2d 01 00 00 3a 02 00 00 11 00 00 00 64 6f 63 50 72 6f 70 73 2f
  [40] 63 6f 72 65 2e 78 6d 6c 8d 91 cd 4e c3 30 10 84 9f 80 77 88 7c 4f 36 4e d4 82 ac a6 95 00 f5 44 25 24 8a 40 dc 2c 7b
  [79] db 5a c4 3f b2 0d 69 df 1e 37 69 43 a5 72 e0 68 cf f8 db d9 f1 6c b1 d7 6d f6 8d 3e 28 6b 1a 42 8b 92 64 68 84 95 ca
 [118] 6c 1b f2 ba 5e e6 77 24 0b 91 1b c9 5b 6b b0 21 07 0c 64 31 bf 99 09 c7 84 f5 f8 ec ad 43 1f 15 86 2c 81 4c 60 c2 35

can i convert this back into the original excel?

Yes, it is a zip archive corresponding to an xlsx file, I imagine.
It will contain numerous XML files with a very specific format.
You can save the raw vector to a file (e.g. writeBin()) and use something like readxl::read_excel() to read it.

(You can work with the zip archive directly with a package such as Rcompression and also with the XML files in the zip archive using a variety of packages.)

for future reference, this worked:

my_tmp_file = tempfile()

getBinaryURL(
  url = my_url, 
  userpwd = my_user_pwd, 
  ftp.use.epsv = FALSE, 
  crlf =TRUE
  ) %>% 
  writeBin(con = my_tmp_file)

db = read_xlsx(
  path = my_tmp_file, 
  sheet = 1
  )