explosion / spacy-alignments

💫 A spaCy package for Yohei Tamura's Rust tokenizations library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

spacy-alignments: Align tokenizations for spaCy + transformers

A spaCy package for Yohei Tamura's Rust tokenizations library with Python bindings.

Installation

pip install -U pip setuptools wheel
pip install spacy-alignments

If no binary wheel is available for your platform, you will need to install Rust in order to build spacy-alignments from source.

spacy-alignments vs. pytokenizations

The spacy_alignments module is a drop-in replacement for tokenizations:

import spacy_alignments as tokenizations
a2b, b2a = tokenizations.get_alignments(["Ã¥", "BC"], ["abc"])
assert a2b == [[0], [0]]
assert b2a == [[0, 1]]

The only difference between this package and the original pytokenizations is that it switches the build system to setuptools-rust to make it easier for us at Explosion to build source and binary packages for a wider range of platforms.

Bug reports and other issues

Please use spaCy's issue tracker to report a bug, or open a new thread on the discussion board for any other issue.

About

💫 A spaCy package for Yohei Tamura's Rust tokenizations library

License:MIT License


Languages

Language:Python 82.8%Language:Rust 17.2%