sparklyr / sparklyr

R interface for Apache Spark

Home Page:https://spark.rstudio.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`spark_install()` installs unexpected version when patch version is not specified

cjyetman opened this issue · comments

Following the instructions in Mastering Spark with R - 2.2.2 Installing Spark, I hit what might be a minor bug, or at least some unexpected behavior.

The book asks you to install version 2.3 to be aligned with its examples. It suggests doing this with spark_install("2.3"), presumably making the assumption that the most recent patch version number will be automatically determined however, because a 3.2.3 version exists spark_install("2.3") installs version 3.2.3.

library(sparklyr)
spark_install("2.3")
spark_installed_versions()
#>   spark hadoop                                          dir
#> 1 2.4.3    2.7 /Users/cjrmi/spark/spark-2.4.3-bin-hadoop2.7
#> 2 3.2.3    3.2 /Users/cjrmi/spark/spark-3.2.3-bin-hadoop3.2

Digging deeper, this happens because of how spark_install_find() and sparklyr:::spark_install_version_expand() currently work...

library(sparklyr)

desired_version <- "2.3"

spark_install_find(
  version = desired_version,
  hadoop_version = NULL,
  installed_only = FALSE,
  latest = TRUE
)$sparkVersion
#> [1] "3.2.3"

sparklyr:::spark_install_version_expand(version = desired_version, installed_only = FALSE)
#> [1] "3.2.3"

This somewhat unexpected behavior occurs here...

versions <- spark_available_versions(show_minor = TRUE)$spark
}
versions <- versions[grepl(version, versions)]

a minor change to the regex could fix this (assuming the intent is to allow passing a [major].[minor] syntax)...

library(sparklyr)
#> 
#> Attaching package: 'sparklyr'
#> The following object is masked from 'package:stats':
#> 
#>     filter

version <- "2.3"

versions <- spark_available_versions(show_minor = TRUE)$spark
versions[grepl(version, versions)]
#> [1] "2.2.3" "2.3.0" "2.3.1" "2.3.2" "2.3.3" "2.3.4" "3.2.3"

versions <- spark_available_versions(show_minor = TRUE)$spark
versions[grepl(paste0("^", version), versions)]
#> [1] "2.3.0" "2.3.1" "2.3.2" "2.3.3" "2.3.4"
my session info
library(sparklyr)
#> 
#> Attaching package: 'sparklyr'
#> The following object is masked from 'package:stats':
#> 
#>     filter
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       macOS Ventura 13.5.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2023-08-24
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 4.3.0)
#>  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
#>  callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
#>  DBI           1.1.3   2022-06-18 [1] CRAN (R 4.3.0)
#>  dbplyr        2.3.3   2023-07-07 [1] CRAN (R 4.3.0)
#>  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
#>  dplyr         1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
#>  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
#>  httpuv        1.6.11  2023-05-11 [1] CRAN (R 4.3.0)
#>  httr          1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
#>  jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
#>  later         1.3.1   2023-05-02 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
#>  mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
#>  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 4.3.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.3.0)
#>  processx      3.8.2   2023-06-30 [1] CRAN (R 4.3.0)
#>  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
#>  promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.0)
#>  ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.0)
#>  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  shiny         1.7.5   2023-08-12 [1] CRAN (R 4.3.0)
#>  sparklyr    * 1.8.2   2023-07-01 [1] CRAN (R 4.3.0)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
#>  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
#>  styler        1.10.1  2023-06-05 [1] CRAN (R 4.3.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyr         1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
#>  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
#>  usethis       2.2.2   2023-07-06 [1] CRAN (R 4.3.0)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
#>  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────