explosion / tokenizations

Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to install `pytokenizations` in Google Colab

yyahav opened this issue · comments

commented

When I run this cell:

!pip install -U pip # update pip
!pip install maturin
!pip install pytokenizations

I get the following output:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (23.1.2)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting maturin
  Using cached maturin-0.14.17-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl (10.5 MB)
Requirement already satisfied: tomli>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from maturin) (2.0.1)
Installing collected packages: maturin
Successfully installed maturin-0.14.17
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytokenizations
  Using cached pytokenizations-0.8.4.tar.gz (3.8 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (pyproject.toml) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Trying to build pytokenizations on Colab, the following output is logged:

⚠️  Warning: Please use maturin in pyproject.toml with a version constraint, e.g. `requires = ["maturin>=0.14,<0.15"]`. This will become an error.
💥 maturin failed
  Caused by: Cargo metadata failed. Do you have cargo in your PATH?
  Caused by: No such file or directory (os error 2)

It seems that because Cargo is not available in Colab it's not possible to install pytokenizations in this platform.
The only workaround is to install cargo and add in the notebook:
!apt-get install cargo

Yes, rust and cargo are required to install pytokenizations from source. I think you're noticing this right now because colab just upgraded the default python to python 3.10 and there are no longer precompiled wheels available for the last pytokenizations release, which is a few years old at this point.

If you'd like to use precompiled wheels for the same functionality, you can try our drop-in replacement library spacy-alignments, which is just pytokenizations with slightly modified python packaging, and it does have more recent releases with wheels for python 3.10 and 3.11.

commented

@adrianeboyd Thank you for the explanation! I'll close this as this is not an actual defect, and I'll give it a go with spacy-alignments