eddelbuettel / rcppsimdjson

Rcpp Bindings for the 'simdjson' Header Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing error @ Windows & RcppSimdJson 0.1.2

iMissile opened this issue · comments

Hi, Dirk.
I use extensively your package as simdjson is extremely fast.
Everything worked with version 0.1.1.
After upgrading to v 0.1.2 I spend enormous time locating parsing bug with a large set of complicated JSON.
Finally I reproduce errors and it looks that there is an encoding issue at lib. Below are minimal examples and output for versions 0.1.1 & 0.1.2.

Testing code

json_text <- stringi::stri_enc_toutf8(
  c('{"store": "234S", "basket_uid": "7323-236565-2312-112", 
               "номер корзины": "000-342342-592-0232",
               "номер_корзины": "111-342342-592-0232"}')
  )

jsonlite::fromJSON(json_text)

RcppSimdJson::fparse(json_text, query = "basket_uid")
RcppSimdJson::fparse(json_text, query = "номер корзины")
RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер корзины"))
RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер_корзины"))

v 0.1.1 output

> json_text <- stringi::stri_enc_toutf8(
+   c('{"store": "234S", "basket_uid": "7323-236565-2312-112", 
+                "номер корзины": "000-342342-592-0232",
+                "номер_корзины": "111-342342-592-0232"}')
+   )
> 
> jsonlite::fromJSON(json_text)
$store
[1] "234S"

$basket_uid
[1] "7323-236565-2312-112"

$`номер корзины`
[1] "000-342342-592-0232"

$номер_корзины
[1] "111-342342-592-0232"

> 
> RcppSimdJson::fparse(json_text, query = "basket_uid")
[1] "7323-236565-2312-112"
> RcppSimdJson::fparse(json_text, query = "номер корзины")
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array,  :
  The JSON field referenced does not exist in this object.
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер корзины"))
[1] "000-342342-592-0232"
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер_корзины"))
[1] "111-342342-592-0232"

My environment is

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251   
[3] LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C                   
[5] LC_TIME=Russian_Russia.1251    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5          dbplyr_1.4.4        pillar_1.4.6       
 [4] compiler_4.0.2      tools_4.0.2         bit_4.0.4          
 [7] digest_0.6.27       lubridate_1.7.9     anytime_0.3.9      
[10] evaluate_0.14       jsonlite_1.7.1      lifecycle_0.2.0    
[13] tibble_3.0.4        checkmate_2.0.0     pkgconfig_2.0.3    
[16] rlang_0.4.8         DBI_1.1.0           rstudioapi_0.11    
[19] yaml_2.2.1          parallel_4.0.2      xfun_0.18          
[22] dplyr_1.0.2         stringr_1.4.0       configr_0.3.5      
[25] knitr_1.30          generics_0.0.2      vctrs_0.3.4        
[28] htmlwidgets_1.5.2   bit64_4.0.5         rprojroot_1.3-2    
[31] tidyselect_1.1.0    RClickhouse_0.5.2   glue_1.4.2         
[34] here_0.1            R6_2.4.1            rmarkdown_2.5      
[37] tidyr_1.1.2         blob_1.2.1          purrr_0.3.4        
[40] magrittr_1.5        listviewer_3.0.0    backports_1.1.10   
[43] scales_1.1.1        ellipsis_0.3.1      htmltools_0.5.0    
[46] assertthat_0.2.1    fst_0.9.4           formattable_0.2.0.1
[49] colorspace_1.4-1    stringi_1.5.3       ini_0.3.1          
[52] munsell_0.5.0       RcppTOML_0.1.6      RcppSimdJson_0.1.1 
[55] crayon_1.3.4  

v0.1.2 buggy output

> install.packages("RcppSimdJson")
Installing package into ‘C:/Users/User/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
пробую URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/RcppSimdJson_0.1.2.zip'
Content type 'application/zip' length 2019382 bytes (1.9 MB)
downloaded 1.9 MB

package ‘RcppSimdJson’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\User\AppData\Local\Temp\Rtmp6HBKTa\downloaded_packages
> # воспроизводим ошибочную ситуацию
> json_text <- stringi::stri_enc_toutf8(
+   c('{"store": "234S", "basket_uid": "7323-236565-2312-112", 
+                "номер корзины": "000-342342-592-0232",
+                "номер_корзины": "111-342342-592-0232"}')
+   )
> 
> jsonlite::fromJSON(json_text)
$store
[1] "234S"

$basket_uid
[1] "7323-236565-2312-112"

$`номер корзины`
[1] "000-342342-592-0232"

$номер_корзины
[1] "111-342342-592-0232"

> 
> RcppSimdJson::fparse(json_text, query = "basket_uid")
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array,  :
  Invalid JSON pointer syntax.
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер корзины"))
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array,  :
  Invalid JSON pointer syntax.
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер_корзины"))
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array,  :
  Invalid JSON pointer syntax.

Ilya

Please see #51 filed a few days ago. I wish people would take the time to look at existing queries.