simonpcouch / anyflights

An R package to generate `nycflights13`-like air travel data🛩️

Home Page:https://simonpcouch.github.io/anyflights/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

utils::unzip error "cannot open file" in get_planes()

ercbk opened this issue · comments

pdxflights19 <- anyflights("PDX", 2019, 6)
                                Total Time Elapsed          
Finished Processing Arguments                   1s          
Downloaded Flights Data for June               44s          
Finished Downloading Flights Data              52s          
Finished Downloading Airlines Data             53s          
  Downloading Planes...                           Error in utils::unzip(planes_tmp, exdir = planes_lcl, junkpaths = TRUE) : 
  cannot open file 'C:/Users/tbats/AppData/Local/Temp/RtmpGs7bI0/planes/MASTER.txt': Invalid argument

Master.txt is in that directory, but I don't know what the problem is.

current session info
- Session info ------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2020-08-29                  

- Packages ----------------------------------------------------------------------------------------------
 package     * version    date       lib source                            
 anyflights  * 0.3.0      2020-08-10 [1] CRAN (R 3.6.3)                    
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.1)                    
 backports     1.1.6      2020-04-05 [1] CRAN (R 3.6.3)                    
 bit           1.1-15.2   2020-02-10 [1] CRAN (R 3.6.2)                    
 bit64         0.9-7      2017-05-08 [1] CRAN (R 3.6.0)                    
 broom         0.5.5      2020-02-29 [1] CRAN (R 3.6.3)                    
 cellranger    1.1.0      2016-07-27 [1] CRAN (R 3.6.1)                    
 cli           2.0.2      2020-02-28 [1] CRAN (R 3.6.3)                    
 colorspace    1.4-1      2019-03-18 [1] CRAN (R 3.6.3)                    
 crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.1)                    
 curl          4.3        2019-12-02 [1] CRAN (R 3.6.2)                    
 DBI           1.1.0      2019-12-15 [1] CRAN (R 3.6.2)                    
 dbplyr        1.4.2      2019-06-17 [1] CRAN (R 3.6.1)                    
 dplyr       * 1.0.1      2020-07-31 [1] CRAN (R 3.6.3)                    
 ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.6.1)                    
 fansi         0.4.1      2020-01-08 [1] CRAN (R 3.6.2)                    
 forcats     * 0.5.0      2020-03-01 [1] CRAN (R 3.6.3)                    
 fs            1.4.1      2020-04-04 [1] CRAN (R 3.6.2)                    
 generics      0.0.2      2018-11-29 [1] CRAN (R 3.6.1)                    
 ggplot2     * 3.3.0.9000 2020-04-04 [1] Github (tidyverse/ggplot2@bca6105)
 glue          1.4.1      2020-05-13 [1] CRAN (R 3.6.3)                    
 gtable        0.3.0      2019-03-25 [1] CRAN (R 3.6.1)                    
 haven         2.2.0      2019-11-08 [1] CRAN (R 3.6.2)                    
 hms           0.5.3      2020-01-08 [1] CRAN (R 3.6.2)                    
 httr          1.4.1      2019-08-05 [1] CRAN (R 3.6.1)                    
 jsonlite      1.7.0      2020-06-25 [1] CRAN (R 3.6.3)                    
 lattice       0.20-38    2018-11-04 [2] CRAN (R 3.6.2)                    
 lifecycle     0.2.0      2020-03-06 [1] CRAN (R 3.6.3)                    
 lubridate     1.7.4      2018-04-11 [1] CRAN (R 3.6.1)                    
 magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.1)                    
 modelr        0.1.6      2020-02-22 [1] CRAN (R 3.6.3)                    
 munsell       0.5.0      2018-06-12 [1] CRAN (R 3.6.1)                    
 nlme          3.1-145    2020-03-04 [1] CRAN (R 3.6.3)                    
 pillar        1.4.3      2019-12-20 [1] CRAN (R 3.6.2)                    
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.6.1)                    
 prettyunits   1.1.1      2020-01-24 [1] CRAN (R 3.6.2)                    
 progress      1.2.2      2019-05-16 [1] CRAN (R 3.6.1)                    
 purrr       * 0.3.3      2019-10-18 [1] CRAN (R 3.6.2)                    
 R6            2.4.1      2019-11-12 [1] CRAN (R 3.6.2)                    
 Rcpp          1.0.5      2020-07-06 [1] CRAN (R 3.6.2)                    
 readr       * 1.3.1      2018-12-21 [1] CRAN (R 3.6.1)                    
 readxl        1.3.1      2019-03-13 [1] CRAN (R 3.6.1)                    
 reprex        0.3.0      2019-05-16 [1] CRAN (R 3.6.1)                    
 rlang         0.4.7      2020-07-09 [1] CRAN (R 3.6.3)                    
 rstudioapi    0.11       2020-02-07 [1] CRAN (R 3.6.3)                    
 rvest         0.3.5      2019-11-08 [1] CRAN (R 3.6.2)                    
 scales        1.1.0      2019-11-18 [1] CRAN (R 3.6.2)                    
 sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.1)                    
 stringi       1.4.6      2020-02-17 [1] CRAN (R 3.6.2)                    
 stringr     * 1.4.0      2019-02-10 [1] CRAN (R 3.6.1)                    
 tibble      * 3.0.0      2020-03-30 [1] CRAN (R 3.6.2)                    
 tidyr       * 1.0.2      2020-01-24 [1] CRAN (R 3.6.2)                    
 tidyselect    1.1.0      2020-05-11 [1] CRAN (R 3.6.3)                    
 tidyverse   * 1.3.0      2019-11-21 [1] CRAN (R 3.6.2)                    
 vctrs         0.3.2      2020-07-15 [1] CRAN (R 3.6.3)                    
 vroom         1.3.1      2020-08-27 [1] CRAN (R 3.6.2)                    
 withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.1)                    
 xml2          1.2.5      2020-03-11 [1] CRAN (R 3.6.3)                    

[1] C:/Users/tbats/Documents/R/win-library/3.6
[2] C:/Program Files/R/R-3.6.2/library

Hmmm.. really strange. Thanks for the thoroughness!

I haven't been able to replicate this in any of the environments I have access to. Will come back to this in a few days and see if anything jumps out at me, but definitely let me know if you have any hunches on where this might be coming from.

When you say "Master.txt is in that directory", do you mean MASTER.txt, or is the file actually capitalized as "Master.txt"? I've seen the file naming/placement for the planes data changing in some years, but I'd be surprised if it also changed as a function of OS, R version, package versions, etc. That, or some sort of read/write permissions issue is all I can come up with at this point.

Apologize. I meant MASTER.txt. Anything specific I can do to test out the read/write thing?

No worries! Thanks for the clarification.

Hmm.. are you on a local setup with administrator permissions? I don't fully understand how tempfile()s (where that MASTER.txt file is actually saved) work in the backend, but can't imagine they'd be generated where they can't be fully accessed.

I'm local. Where are these planes_tmp and planes_lcl file/dir located or are those already cleaned up?

planes_tmp and planes_lcl are temporary files/directories that are made use of while get_planes() runs.

These are the lines that seem to be giving you trouble. After setting the following values, does running that code go smoothly for you?

year <- 2019
dir <- tempdir()
flights_data <- NULL

Yep, no problems.

It's the unzip inside process_planes_ref() where I get the error.

I bolded the line where the traceback highlighted the approximate location, but the indents are screwed up

function (zipfile, files = NULL, list = FALSE, overwrite = TRUE, 
      junkpaths = FALSE, exdir = ".", unzip = "internal", setTimes = FALSE) 
{
      if (identical(unzip, "internal")) {
            if (!list && !missing(exdir)) 
                  dir.create(exdir, showWarnings = FALSE, recursive = TRUE)
            res <- .External(C_unzip, zipfile, files, exdir, list, 
                  overwrite, junkpaths, setTimes)
            if (list) {
                  dates <- as.POSIXct(res[[3]], "%Y-%m-%d %H:%M", 
                        tz = "UTC")
                  data.frame(Name = res[[1]], Length = res[[2]], Date = dates, 
                        stringsAsFactors = FALSE)
            }
            else invisible(attr(res, "extracted"))
      }
      else {
            WINDOWS <- .Platform$OS.type == "windows"

if (!is.character(unzip) || length(unzip) != 1L || !nzchar(unzip))

        stop("'unzip' must be a single character string")
            zipfile <- path.expand(zipfile)
   if (list) {
                  res <- if (WINDOWS) 
                        system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE)
                  else system2(unzip, c("-ql", shQuote(zipfile)), 
                        stdout = TRUE, env = c("TZ=UTC"))
                  l <- length(res)
                  res2 <- res[-c(2, l - 1, l)]
                  res3 <- gsub(" *([^ ]+) +([^ ]+) +([^ ]+) +(.*)", 
                        "\\1 \\2 \\3 \"\\4\"", res2)
                  con <- textConnection(res3)
                  on.exit(close(con))
                  z <- read.table(con, header = TRUE, as.is = TRUE)
                  dt <- paste(z$Date, z$Time)
                  formats <- if (max(nchar(z$Date) > 8)) 
                        c("%Y-%m-%d", "%d-%m-%Y", "%m-%d-%Y")
                  else c("%m-%d-%y", "%d-%m-%y", "%y-%m-%d")
                  slash <- any(grepl("/", z$Date))
                  if (slash) 
                        formats <- gsub("-", "/", formats)
                  formats <- paste(formats, "%H:%M")
                  for (f in formats) {
                        zz <- as.POSIXct(dt, tz = "UTC", format = f)
                        if (all(!is.na(zz))) 
                              break
                  }
                  z[, "Date"] <- zz
                  z[c("Name", "Length", "Date")]
            }
            else {
                  args <- character()
                  if (junkpaths) 
                        args <- c(args, "-j")
                  if (overwrite) 
                        args <- c(args, "-oq", shQuote(zipfile))
                  else args <- c(args, "-nq", shQuote(zipfile))
                  if (length(files)) 
                        args <- c(args, shQuote(files))
                  if (exdir != ".") 
                        args <- c(args, "-d", shQuote(exdir))
                  if (WINDOWS) 
                        system2(unzip, args, stdout = NULL, stderr = NULL, 
                              invisible = TRUE)
                  else system2(unzip, args, stdout = NULL, stderr = NULL)
                  invisible(NULL)
            }
      }
}

I think I got something. MASTER.txt doesn't get deleted with unlink in process_planes_master(). When I try to delete it manually it says it's open in RStudio and if I try to manually replace it with another MASTER.txt file, it errors and says it can't be done with a user-mapped session open (whatever that means). I'm guessing when process_planes_ref() tries to do it's thing, this is what it's running into. Something about vroom is keeping that file active I guess.

This issue seems to be what I'm dealing with. I have the latest version 1.3.1, so I dunno.

Oof... I appreciate all of your energy in tracking this down.

I'm going to let this issue sit for a bit and see if anyone runs into the same issue.