inbo / reporting-rshiny-grofwildjacht

Rshiny app for grofwildjacht

Home Page:https://grofwildjacht.inbo.be/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] error in data reading

SanderDevisscher opened this issue · comments

Describe the bug
Er klopt nog iets niet. Het lukt me om de docker te bouwen maar volgens de lokale versie van de app is er maar data tot 31/12/2022 beschikbaar. Maar als ik kijk naar de UAT op faunabeheer-dev.inbo.be zie ik dat er data tot 05/08/2023 beschikbaar is. Ik vermoed dat de huidige versie de data uit de foute bron inleest.

To Reproduce
Steps to reproduce the behavior:

  1. Build public docker locally
  2. goto localhost
  3. goto faunabeheer-dev.inbo.be
  4. compare dates from both & See error

Expected behavior
Both apps are built from same datasources

Screenshots

Local version (0.3.0)

image

UAT version (0.2.9)

image

Additional context
Current UAT hash: 9c2fffa

I'm not sure where the data for faunabeheer-dev.inbo.be comes from. It should be defined in the shinyproxy yaml which I don't have access to #386 (comment)

The data in bucket inbo-wbe-uat-data has indeed maximum date 31/12/2022.

library(reportingGrofwild)
setupS3()
ecoData <- loadRawData(type = "eco")
max(ecoData$afschot_datum, na.rm = TRUE)
[1] "2022-12-31"

@berthuygens @TheJenne18 can you guys look into the data source of faunabeheer-dev.inbo.be ?

@TheJenne18 verified the UAT app gets its data from the UAT bucket (& PRD from PRD bucket).

But I suspect both versions read a different file from the UAT bucket. There are currently 3 ecology files on the bucket, namely:

  1. "rshiny_reporting_data_ecology.csv" with max(df$afschot_datum, na.rm = TRUE) == "2023-07-19" => no longer updated in favor of reportingGrofwild preprocessing function createRawData().
  2. "rshiny_reporting_data_ecology_processed.csv" with max(df$afschot_datum, na.rm = TRUE) == "2023-08-05"
  3. "rshiny_reporting_data_ecology_processed.RData" which I'm unable to load using aws.s3::s3load(). It allways returns NULL

I suspect the UAT reads from "rshiny_reporting_data_ecology_processed.csv" while the local docker file reads from "rshiny_reporting_data_ecology_processed.RData".

Additionally when running the code you provided (exclu. setupS3()) max(ecoData$afschot_datum, na.rm = TRUE) yields "2023-08-05" using reportingGrofwild v0.2.9 but the same code yields "2022-12-31" using reportingGrofwild v0.3.0. suggesting something substantial concerning loading of the data changed between versions.

Thanks for explaining, now I see the problem (I think so)

The function createRawData takes as input file "rshiny_reporting_data_ecology.csv" and outputs "rshiny_reporting_data_ecology_processed.RData"

So the problem should be fixed if you rename your input file to rshiny_reporting_data_ecology.csv and run the createRawData() function again.

I suspect similar changes are needed to the other input raw data files, listed here

On a side note

3. "rshiny_reporting_data_ecology_processed.RData" which I'm unable to load using aws.s3::s3load(). It allways returns NULL

Each .RData file contains an object called rawData which can then be assigned to another variable depending on the data type.
So, this should work:

aws.s3::s3load(bucket = "inbo-wbe-uat-data", object = "rshiny_reporting_data_ecology_processed.RData")
dim(rawData)

ecoData <- rawData
dim(ecoData)

Using reportingGrofwild v0.3.0 i get some errors when updating wnm en spread data. I'll add the error messages after fieldwork.

when running createSpreadData() I get the following error right after loading "Municipalities_ModelOutput_toekomst_verspreiding_2023.shp":

Error in `[.data.frame`(x@data, i, j, ..., drop = FALSE) :
undefined columns selected

When running

createRawData(dataDir = tempdir_wnm,
                type = "waarnemingen",
                bucket = "inbo-wbe-uat-data")

I get the following error:

Error in `[.data.table`(x, i, which = TRUE) : 
  When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

running both functions with reportingGrofwild 0.2.9 results in no errors.

Could you share the input files you use for these two? With the files I have it works.

Sorry for the delay.

  • For waarnemingen: I had to update the code to fix the error.
  • For spread data: The files in the zip folder are named differently from before. E.g. new "Model_output_Pixels" versus old "Pixels_ModelOutput_toekomst_verspr". @SanderDevisscher I can update the code for this unless you will use the old names in the future?

For both: I've updated the data in the UAT bucket wrt the files you shared

Sorry for the delay.

  • For waarnemingen: I had to update the code to fix the error.

I'll test the changes.

  • For spread data: The files in the zip folder are named differently from before. E.g. new "Model_output_Pixels" versus old "Pixels_ModelOutput_toekomst_verspr". @SanderDevisscher I can update the code for this unless you will use the old names in the future?

No need I'll change the filenames on our side.

For both: I've updated the data in the UAT bucket wrt the files you shared

Updating waarnemingen works as it should.

My Sessioninfo:

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=Dutch_Belgium.utf8  LC_CTYPE=Dutch_Belgium.utf8    LC_MONETARY=Dutch_Belgium.utf8 LC_NUMERIC=C                   LC_TIME=Dutch_Belgium.utf8    

time zone: Europe/Brussels
tzcode source: internal

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] testthat_3.1.10         sp_2.0-0                reportingGrofwild_0.3.0 lubridate_1.9.2.9000    forcats_1.0.0           stringr_1.5.0          
 [7] dplyr_1.1.2             purrr_1.0.1             readr_2.1.4             tidyr_1.3.0             tibble_3.2.1            ggplot2_3.4.3          
[13] tidyverse_2.0.0         janitor_2.2.0           aws.s3_0.3.21           svDialogs_1.1.0        

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3    rstudioapi_0.15.0     jsonlite_1.8.7        magrittr_2.0.3        rmarkdown_2.24        fs_1.6.2              vctrs_0.6.2          
  [8] memoise_2.0.1         config_0.3.1          base64enc_0.1-3       webshot_0.5.5         htmltools_0.5.5       usethis_2.1.6         curl_5.0.2           
 [15] sass_0.4.7            KernSmooth_2.23-20    bslib_0.5.1           desc_1.4.2            htmlwidgets_1.6.2     plyr_1.8.8            plotly_4.10.2        
 [22] cachem_1.0.8          conflicted_1.2.0      mime_0.12             lifecycle_1.0.3       pkgconfig_2.0.3       Matrix_1.6-0          R6_2.5.1             
 [29] fastmap_1.1.1         shiny_1.7.5           snakecase_0.11.0      digest_0.6.31         showtext_0.9-6        colorspace_2.1-0      shinycssloaders_1.0.0
 [36] ps_1.7.5              rprojroot_2.0.3       pkgload_1.3.2.1       aws.signature_0.6.0   crosstalk_1.2.0       fansi_1.0.4           timechange_0.2.0     
 [43] httr_1.4.7            mgcv_1.8-42           compiler_4.3.0        proxy_0.4-27          remotes_2.4.2.1       bit64_4.0.5           withr_2.5.0          
 [50] DBI_1.1.3             pkgbuild_1.4.0        aws.ec2metadata_0.2.0 sessioninfo_1.2.2     leaflet_2.1.2         classInt_0.4-9        units_0.8-3          
 [57] odbc_1.3.4            httpuv_1.6.11         glue_1.6.2            callr_3.7.3           nlme_3.1-162          promises_1.2.0.1      grid_4.3.0           
 [64] sf_1.0-14             reshape2_1.4.4        generics_0.1.3        gtable_0.3.4          tzdb_0.4.0            class_7.3-21          data.table_1.14.8    
 [71] hms_1.1.3             xml2_1.3.5            utf8_1.2.3            pillar_1.9.0          vroom_1.6.3           later_1.3.1           splines_4.3.0        
 [78] lattice_0.21-8        showtextdb_3.0        bit_4.0.5             tidyselect_1.2.0      miniUI_0.1.1.1        knitr_1.43            xfun_0.40            
 [85] devtools_2.4.5        brio_1.1.3            DT_0.29               stringi_1.7.12        lazyeval_0.2.2        yaml_2.3.7            geojsonsf_2.0.3      
 [92] evaluate_0.21         cli_3.6.1             xtable_1.8-4          munsell_0.5.0         processx_3.8.1        jquerylib_0.1.4       svGUI_1.0.1          
 [99] Rcpp_1.0.10           parallel_4.3.0        ellipsis_0.3.2        assertthat_0.2.1      blob_1.2.4            prettyunits_1.1.1     profvis_0.3.8        
[106] urlchecker_1.0.1      flexdashboard_0.6.2   INBOtheme_0.5.9       viridisLite_0.4.2     scales_1.2.1          fortunes_1.5-4        sysfonts_0.8.8       
[113] e1071_1.7-13          crayon_1.5.2          rlang_1.1.1           waldo_0.5.1

@mvarewyck any idea which package is the culprit ?

I'll need to investigate more. I now get another error

Reading layer `Model_output_Municipalities_2023' from data source 
  `/home/mvarewyck/git/reporting-rshiny-grofwildjacht/data/Model_output_Municipalities_2023.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 308 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.541329 ymin: 50.68749 xmax: 5.911109 ymax: 51.50511
Geodetic CRS:  GCS_WGS_84_with_axis_order_normalized_for_visualization
Error: arguments have different crs

A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.

What is the correct name of the wildschade_georef - file used in createRawData()? I now provide WildSchade_georef.csv

A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.

What is the correct name of the wildschade_georef - file used in createRawData()? I now provide WildSchade_georef.csv

WildSchade_georef.csv is the correct name.

I've made spatialData a compulsory function argument to preprocess the spread data. This fixes #414 (comment)

readS3(file = "spatialData_sf.RData")
createSpreadData(spatialData = spatialData)

A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.
What is the correct name of the wildschade_georef - file used in createRawData()? I now provide WildSchade_georef.csv

WildSchade_georef.csv is the correct name.

@SanderDevisscher If you send me the raw data file for wildschade, I'll have a look at it.

filesize of raw data exceeds 25mb so I had to wetransfer the file. You should have received a mail.