[BUG] error in data reading
SanderDevisscher opened this issue · comments
Describe the bug
Er klopt nog iets niet. Het lukt me om de docker te bouwen maar volgens de lokale versie van de app is er maar data tot 31/12/2022 beschikbaar. Maar als ik kijk naar de UAT op faunabeheer-dev.inbo.be zie ik dat er data tot 05/08/2023 beschikbaar is. Ik vermoed dat de huidige versie de data uit de foute bron inleest.
To Reproduce
Steps to reproduce the behavior:
- Build public docker locally
- goto localhost
- goto faunabeheer-dev.inbo.be
- compare dates from both & See error
Expected behavior
Both apps are built from same datasources
Screenshots
Local version (0.3.0)
UAT version (0.2.9)
Additional context
Current UAT hash: 9c2fffa
I'm not sure where the data for faunabeheer-dev.inbo.be
comes from. It should be defined in the shinyproxy yaml which I don't have access to #386 (comment)
The data in bucket inbo-wbe-uat-data
has indeed maximum date 31/12/2022.
library(reportingGrofwild)
setupS3()
ecoData <- loadRawData(type = "eco")
max(ecoData$afschot_datum, na.rm = TRUE)
[1] "2022-12-31"
@berthuygens @TheJenne18 can you guys look into the data source of faunabeheer-dev.inbo.be ?
@TheJenne18 verified the UAT app gets its data from the UAT bucket (& PRD from PRD bucket).
But I suspect both versions read a different file from the UAT bucket. There are currently 3 ecology files on the bucket, namely:
- "rshiny_reporting_data_ecology.csv" with
max(df$afschot_datum, na.rm = TRUE) == "2023-07-19"
=> no longer updated in favor of reportingGrofwild preprocessing functioncreateRawData()
. - "rshiny_reporting_data_ecology_processed.csv" with
max(df$afschot_datum, na.rm = TRUE) == "2023-08-05"
- "rshiny_reporting_data_ecology_processed.RData" which I'm unable to load using
aws.s3::s3load()
. It allways returnsNULL
I suspect the UAT reads from "rshiny_reporting_data_ecology_processed.csv" while the local docker file reads from "rshiny_reporting_data_ecology_processed.RData".
Additionally when running the code you provided (exclu. setupS3()
) max(ecoData$afschot_datum, na.rm = TRUE)
yields "2023-08-05"
using reportingGrofwild v0.2.9 but the same code yields "2022-12-31"
using reportingGrofwild v0.3.0. suggesting something substantial concerning loading of the data changed between versions.
Thanks for explaining, now I see the problem (I think so)
The function createRawData
takes as input file "rshiny_reporting_data_ecology.csv" and outputs "rshiny_reporting_data_ecology_processed.RData"
So the problem should be fixed if you rename your input file to rshiny_reporting_data_ecology.csv
and run the createRawData()
function again.
I suspect similar changes are needed to the other input raw data files, listed here
On a side note
3. "rshiny_reporting_data_ecology_processed.RData" which I'm unable to load using
aws.s3::s3load()
. It allways returnsNULL
Each .RData file contains an object called rawData
which can then be assigned to another variable depending on the data type.
So, this should work:
aws.s3::s3load(bucket = "inbo-wbe-uat-data", object = "rshiny_reporting_data_ecology_processed.RData")
dim(rawData)
ecoData <- rawData
dim(ecoData)
Using reportingGrofwild v0.3.0 i get some errors when updating wnm en spread data. I'll add the error messages after fieldwork.
when running createSpreadData()
I get the following error right after loading "Municipalities_ModelOutput_toekomst_verspreiding_2023.shp":
Error in `[.data.frame`(x@data, i, j, ..., drop = FALSE) :
undefined columns selected
When running
createRawData(dataDir = tempdir_wnm,
type = "waarnemingen",
bucket = "inbo-wbe-uat-data")
I get the following error:
Error in `[.data.table`(x, i, which = TRUE) :
When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
running both functions with reportingGrofwild 0.2.9 results in no errors.
Could you share the input files you use for these two? With the files I have it works.
These are the input files I used:
Sorry for the delay.
- For waarnemingen: I had to update the code to fix the error.
- For spread data: The files in the zip folder are named differently from before. E.g. new "Model_output_Pixels" versus old "Pixels_ModelOutput_toekomst_verspr". @SanderDevisscher I can update the code for this unless you will use the old names in the future?
For both: I've updated the data in the UAT bucket wrt the files you shared
Sorry for the delay.
- For waarnemingen: I had to update the code to fix the error.
I'll test the changes.
- For spread data: The files in the zip folder are named differently from before. E.g. new "Model_output_Pixels" versus old "Pixels_ModelOutput_toekomst_verspr". @SanderDevisscher I can update the code for this unless you will use the old names in the future?
No need I'll change the filenames on our side.
For both: I've updated the data in the UAT bucket wrt the files you shared
Updating waarnemingen works as it should.
My Sessioninfo:
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Dutch_Belgium.utf8 LC_CTYPE=Dutch_Belgium.utf8 LC_MONETARY=Dutch_Belgium.utf8 LC_NUMERIC=C LC_TIME=Dutch_Belgium.utf8
time zone: Europe/Brussels
tzcode source: internal
attached base packages:
[1] tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] testthat_3.1.10 sp_2.0-0 reportingGrofwild_0.3.0 lubridate_1.9.2.9000 forcats_1.0.0 stringr_1.5.0
[7] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.3
[13] tidyverse_2.0.0 janitor_2.2.0 aws.s3_0.3.21 svDialogs_1.1.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7 magrittr_2.0.3 rmarkdown_2.24 fs_1.6.2 vctrs_0.6.2
[8] memoise_2.0.1 config_0.3.1 base64enc_0.1-3 webshot_0.5.5 htmltools_0.5.5 usethis_2.1.6 curl_5.0.2
[15] sass_0.4.7 KernSmooth_2.23-20 bslib_0.5.1 desc_1.4.2 htmlwidgets_1.6.2 plyr_1.8.8 plotly_4.10.2
[22] cachem_1.0.8 conflicted_1.2.0 mime_0.12 lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.6-0 R6_2.5.1
[29] fastmap_1.1.1 shiny_1.7.5 snakecase_0.11.0 digest_0.6.31 showtext_0.9-6 colorspace_2.1-0 shinycssloaders_1.0.0
[36] ps_1.7.5 rprojroot_2.0.3 pkgload_1.3.2.1 aws.signature_0.6.0 crosstalk_1.2.0 fansi_1.0.4 timechange_0.2.0
[43] httr_1.4.7 mgcv_1.8-42 compiler_4.3.0 proxy_0.4-27 remotes_2.4.2.1 bit64_4.0.5 withr_2.5.0
[50] DBI_1.1.3 pkgbuild_1.4.0 aws.ec2metadata_0.2.0 sessioninfo_1.2.2 leaflet_2.1.2 classInt_0.4-9 units_0.8-3
[57] odbc_1.3.4 httpuv_1.6.11 glue_1.6.2 callr_3.7.3 nlme_3.1-162 promises_1.2.0.1 grid_4.3.0
[64] sf_1.0-14 reshape2_1.4.4 generics_0.1.3 gtable_0.3.4 tzdb_0.4.0 class_7.3-21 data.table_1.14.8
[71] hms_1.1.3 xml2_1.3.5 utf8_1.2.3 pillar_1.9.0 vroom_1.6.3 later_1.3.1 splines_4.3.0
[78] lattice_0.21-8 showtextdb_3.0 bit_4.0.5 tidyselect_1.2.0 miniUI_0.1.1.1 knitr_1.43 xfun_0.40
[85] devtools_2.4.5 brio_1.1.3 DT_0.29 stringi_1.7.12 lazyeval_0.2.2 yaml_2.3.7 geojsonsf_2.0.3
[92] evaluate_0.21 cli_3.6.1 xtable_1.8-4 munsell_0.5.0 processx_3.8.1 jquerylib_0.1.4 svGUI_1.0.1
[99] Rcpp_1.0.10 parallel_4.3.0 ellipsis_0.3.2 assertthat_0.2.1 blob_1.2.4 prettyunits_1.1.1 profvis_0.3.8
[106] urlchecker_1.0.1 flexdashboard_0.6.2 INBOtheme_0.5.9 viridisLite_0.4.2 scales_1.2.1 fortunes_1.5-4 sysfonts_0.8.8
[113] e1071_1.7-13 crayon_1.5.2 rlang_1.1.1 waldo_0.5.1
@mvarewyck any idea which package is the culprit ?
I'll need to investigate more. I now get another error
Reading layer `Model_output_Municipalities_2023' from data source
`/home/mvarewyck/git/reporting-rshiny-grofwildjacht/data/Model_output_Municipalities_2023.shp'
using driver `ESRI Shapefile'
Simple feature collection with 308 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.541329 ymin: 50.68749 xmax: 5.911109 ymax: 51.50511
Geodetic CRS: GCS_WGS_84_with_axis_order_normalized_for_visualization
Error: arguments have different crs
A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.
What is the correct name of the wildschade_georef - file used in createRawData()
? I now provide WildSchade_georef.csv
A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.
What is the correct name of the wildschade_georef - file used in
createRawData()
? I now provide WildSchade_georef.csv
WildSchade_georef.csv is the correct name.
I've made spatialData
a compulsory function argument to preprocess the spread data. This fixes #414 (comment)
readS3(file = "spatialData_sf.RData")
createSpreadData(spatialData = spatialData)
A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.
What is the correct name of the wildschade_georef - file used increateRawData()
? I now provide WildSchade_georef.csvWildSchade_georef.csv is the correct name.
@SanderDevisscher If you send me the raw data file for wildschade, I'll have a look at it.
filesize of raw data exceeds 25mb so I had to wetransfer the file. You should have received a mail.