Parsing error @ Windows & RcppSimdJson 0.1.2
iMissile opened this issue · comments
Hi, Dirk.
I use extensively your package as simdjson is extremely fast.
Everything worked with version 0.1.1.
After upgrading to v 0.1.2 I spend enormous time locating parsing bug with a large set of complicated JSON.
Finally I reproduce errors and it looks that there is an encoding issue at lib. Below are minimal examples and output for versions 0.1.1 & 0.1.2.
Testing code
json_text <- stringi::stri_enc_toutf8(
c('{"store": "234S", "basket_uid": "7323-236565-2312-112",
"номер корзины": "000-342342-592-0232",
"номер_корзины": "111-342342-592-0232"}')
)
jsonlite::fromJSON(json_text)
RcppSimdJson::fparse(json_text, query = "basket_uid")
RcppSimdJson::fparse(json_text, query = "номер корзины")
RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер корзины"))
RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер_корзины"))
v 0.1.1 output
> json_text <- stringi::stri_enc_toutf8(
+ c('{"store": "234S", "basket_uid": "7323-236565-2312-112",
+ "номер корзины": "000-342342-592-0232",
+ "номер_корзины": "111-342342-592-0232"}')
+ )
>
> jsonlite::fromJSON(json_text)
$store
[1] "234S"
$basket_uid
[1] "7323-236565-2312-112"
$`номер корзины`
[1] "000-342342-592-0232"
$номер_корзины
[1] "111-342342-592-0232"
>
> RcppSimdJson::fparse(json_text, query = "basket_uid")
[1] "7323-236565-2312-112"
> RcppSimdJson::fparse(json_text, query = "номер корзины")
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array, :
The JSON field referenced does not exist in this object.
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер корзины"))
[1] "000-342342-592-0232"
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер_корзины"))
[1] "111-342342-592-0232"
My environment is
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251
[3] LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C
[5] LC_TIME=Russian_Russia.1251
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 dbplyr_1.4.4 pillar_1.4.6
[4] compiler_4.0.2 tools_4.0.2 bit_4.0.4
[7] digest_0.6.27 lubridate_1.7.9 anytime_0.3.9
[10] evaluate_0.14 jsonlite_1.7.1 lifecycle_0.2.0
[13] tibble_3.0.4 checkmate_2.0.0 pkgconfig_2.0.3
[16] rlang_0.4.8 DBI_1.1.0 rstudioapi_0.11
[19] yaml_2.2.1 parallel_4.0.2 xfun_0.18
[22] dplyr_1.0.2 stringr_1.4.0 configr_0.3.5
[25] knitr_1.30 generics_0.0.2 vctrs_0.3.4
[28] htmlwidgets_1.5.2 bit64_4.0.5 rprojroot_1.3-2
[31] tidyselect_1.1.0 RClickhouse_0.5.2 glue_1.4.2
[34] here_0.1 R6_2.4.1 rmarkdown_2.5
[37] tidyr_1.1.2 blob_1.2.1 purrr_0.3.4
[40] magrittr_1.5 listviewer_3.0.0 backports_1.1.10
[43] scales_1.1.1 ellipsis_0.3.1 htmltools_0.5.0
[46] assertthat_0.2.1 fst_0.9.4 formattable_0.2.0.1
[49] colorspace_1.4-1 stringi_1.5.3 ini_0.3.1
[52] munsell_0.5.0 RcppTOML_0.1.6 RcppSimdJson_0.1.1
[55] crayon_1.3.4
v0.1.2 buggy output
> install.packages("RcppSimdJson")
Installing package into ‘C:/Users/User/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
пробую URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/RcppSimdJson_0.1.2.zip'
Content type 'application/zip' length 2019382 bytes (1.9 MB)
downloaded 1.9 MB
package ‘RcppSimdJson’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\User\AppData\Local\Temp\Rtmp6HBKTa\downloaded_packages
> # воспроизводим ошибочную ситуацию
> json_text <- stringi::stri_enc_toutf8(
+ c('{"store": "234S", "basket_uid": "7323-236565-2312-112",
+ "номер корзины": "000-342342-592-0232",
+ "номер_корзины": "111-342342-592-0232"}')
+ )
>
> jsonlite::fromJSON(json_text)
$store
[1] "234S"
$basket_uid
[1] "7323-236565-2312-112"
$`номер корзины`
[1] "000-342342-592-0232"
$номер_корзины
[1] "111-342342-592-0232"
>
> RcppSimdJson::fparse(json_text, query = "basket_uid")
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array, :
Invalid JSON pointer syntax.
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер корзины"))
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array, :
Invalid JSON pointer syntax.
> RcppSimdJson::fparse(json_text, query = stringi::stri_enc_toutf8("номер_корзины"))
Ошибка в .deserialize_json(json = json, query = query, empty_array = empty_array, :
Invalid JSON pointer syntax.
Ilya
Please see #51 filed a few days ago. I wish people would take the time to look at existing queries.