OscarKjell / text

Using Transformers from HuggingFace in R

Home Page:https://r-text.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error while running textrpp_install()

cenotechnology opened this issue · comments

Hi there,
thanks for developing this program; I highly appreciate your contribution.

The installation interrupted and the system shows the following message:

`Building wheel for tokenizers (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [616 lines of output]
running bdist_wheel
running build
running build_py`

ERROR: Failed building wheel for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects Error: Error installing package(s): "'torch==2.0.0'", "'transformers==4.19.2'", "numpy", "pandas", "'nltk==3.6.7'", "scikit-learn", "'datasets==2.9.0'", "evaluate"

Many thanks for your support.

I should add that I have followed the instruction provided online: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
but the problem persists.
Kind regards!

Could you run the code and paste the output here?
The code to run is reticulate::py_list_packages("textrpp_condaenv").

BTW, please also run the code sessionInfo() and paste the output here.

Many thanks for your prompt response.
Here is the outcome for the first code: > reticulate::py_list_packages("textrpp_condaenv") package version requirement channel 1 ca-certificates 2023.11.17 ca-certificates=2023.11.17 conda-forge 2 libcxx 16.0.6 libcxx=16.0.6 conda-forge 3 libffi 3.3 libffi=3.3 conda-forge 4 libsqlite 3.44.2 libsqlite=3.44.2 conda-forge 5 libzlib 1.2.13 libzlib=1.2.13 conda-forge 6 ncurses 6.4 ncurses=6.4 conda-forge 7 openssl 1.1.1w openssl=1.1.1w conda-forge 8 pip 23.3.1 pip=23.3.1 conda-forge 9 python 3.9.0 python=3.9.0 conda-forge 10 readline 8.2 readline=8.2 conda-forge 11 setuptools 68.2.2 setuptools=68.2.2 conda-forge 12 sqlite 3.44.2 sqlite=3.44.2 conda-forge 13 tk 8.6.13 tk=8.6.13 conda-forge 14 tzdata 2023c tzdata=2023c conda-forge 15 wheel 0.42.0 wheel=0.42.0 conda-forge 16 xz 5.2.6 xz=5.2.6 conda-forge 17 zlib 1.2.13 zlib=1.2.13 conda-forge

Here is the outcomes of the second code:

`> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] reticulate_1.34.0 text_1.0

loaded via a namespace (and not attached):
[1] gtable_0.3.4 ggplot2_3.4.4 recipes_1.0.8 overlapping_2.1 lattice_0.22-5 vctrs_0.6.4
[7] tools_4.3.2 generics_0.1.3 parallel_4.3.2 tibble_3.2.1 fansi_1.0.5 pkgconfig_2.0.3
[13] Matrix_1.6-3 data.table_1.14.8 lhs_1.1.6 GPfit_1.0-8 lifecycle_1.0.4 compiler_4.3.2
[19] brio_1.1.3 munsell_0.5.0 codetools_0.2-19 DiceDesign_1.9 class_7.3-22 tune_1.1.2
[25] prodlim_2023.08.28 pillar_1.9.0 furrr_0.3.1 tidyr_1.3.0 MASS_7.3-60 gower_1.0.1
[31] yardstick_1.2.0 iterators_1.0.14 foreach_1.5.2 rpart_4.1.21 parallelly_1.36.0 lava_1.7.3
[37] dials_1.2.0 tidyselect_1.2.0 digest_0.6.33 stringi_1.8.2 future_1.33.0 dplyr_1.1.4
[43] purrr_1.0.2 listenv_0.9.0 splines_4.3.2 cowplot_1.1.1 parsnip_1.1.1 grid_4.3.2
[49] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 survival_3.5-7 utf8_1.2.4 future.apply_1.11.0
[55] withr_2.5.2 scales_1.2.1 lubridate_1.9.3 timechange_0.2.0 globals_0.16.2 nnet_7.3-19
[61] timeDate_4022.108 png_0.1-8 workflows_1.1.3 testthat_3.2.0 hardhat_1.3.0 rsample_1.2.0
[67] rlang_1.1.2 Rcpp_1.0.11 glue_1.6.2 ipred_0.9-14 rstudioapi_0.15.0 jsonlite_1.8.7
[73] R6_2.5.1`

Once again, thanks for your help.

It seems your enviornment is correctly set.
You do not need to run the curl command manually. The installation of the text package will automatically run that.
So just try to reinstall the text package following the steps below.

  1. First uninstall miniconda by manually deleting the folder got from the code reticulate::miniconda_path() (should be similar to /Users/macID/Library/r-miniconda-arm64) in the Library folder (replace the macID with your mac user name).
  2. Uninstall the text package (if this is installed).
  3. Try install the github version via code install.packages("devtools"), and devtools::install_github("oscarkjell/text").
  4. Follow the steps in the extended installation guide.

Many thanks for your support. But the problem persists. Perhaps, it is a compatibility issue.

Have you installed any version of conda before installing the text package? Especially anaconda and miniconda.
The incompatibility may be due to the incorrect path of the python package installer.

Thanks Moomoofarm1. I did not have any other version of anaconda or Miniconda on the laptop. I think it has to do with macOS. This morning, I tried to install it on a windows laptop, and it was successful.
My macOS version is: 14.1.1 (23B81) (just for reference).
Have a great day!

Nice to hear your success.
If the imcompatibility comes again, please paste the message printed by the code reticulate::py_last_error().
It might help.

Many thanks. I ran it, but got null as output.

'ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Error: Error installing package(s): "'torch==2.0.0'", "'transformers==4.19.2'", "numpy", "pandas", "'nltk==3.6.7'", "scikit-learn", "'datasets==2.9.0'", "evaluate"

reticulate::py_last_error()
NULL
reticulate::py_last_error()
NULL
'

Maybe this helps by installing tokenizers directly.
reticulate::conda_install(envname="textrpp_condaenv", packages=c("tokenizers==0.13.1"), pip=TRUE)

Finally! many thanks, it works. But, I also had to install the packages "manually". I still had some issues when I tried to install transformers 4.19.2; then I changed it to 4.35.2, and it all worked out. The system has successfully initialised textrpp; and I successfully test textEmbed("hello") .
Here are the packages that I have installed "manually":
reticulate::conda_install(envname="textrpp_condaenv", packages=c("tokenizers==0.13.1"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("torch==2.0.0"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("nltk==3.6.7"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("datasets==2.9.0"), pip=TRUE) reticulate::conda_install(envname="textrpp_condaenv", packages=c("transformers==4.35.2"), pip=TRUE)

Once again many thanks for your support and patience!

I will update the code to automate the installation if necessary in the future.

FYI, I tried to install the package with my macbook with M2 processor with the normal procedure, same issue persisted. I tried the installation of the packages manually like mentioned by OP and this worked. Maybe this could be added to the advanced installation guide for Apple M1/M2 users?