diyclassics / la_senter

Repository for training spaCy-compatible sentence segmenter for Latin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸͺ spaCy Project: la_senter

Code required to train spaCy-compatible sentence segmenter for Latin.

πŸ“‹ project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
assets Download assets
preprocess Convert the data to spaCy's format
load-vectors load floret vectors
train Train senter
evaluate Evaluate on the test data and save the metrics
package Package the trained model so it can be installed
document Document senter
clean Remove intermediate files

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all assets β†’ preprocess β†’ load-vectors β†’ train β†’ evaluate β†’ package β†’ document β†’ clean

πŸ—‚ Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/treebanks/UD_Latin-Perseus Git
assets/treebanks/UD_Latin-PROIEL Git
assets/treebanks/UD_Latin-ITTB Git
assets/treebanks/UD_Latin-LLCT Git
assets/treebanks/UD_Latin-UDante Git

Install

  • To install the current version...
    • pip install https://huggingface.co/diyclassics/la_senter/resolve/main/la_senter-0.3.0/dist/la_senter-0.3.0.tar.gz

Current version

Feature Description
Name la_senter
Version 0.3.0
spaCy >=3.4.2,<3.6.0
Default Pipeline senter
Components senter
Vectors -1 keys, 50000 unique vectors (300 dimensions)
Sources UD_Latin-Perseus
UD_Latin-PROIEL
UD_Latin-ITTB
UD_Latin-LLCT
UD_Latin-UDante
License MIT
Author Patrick J. Burns

Accuracy

Type Score
SENTS_F 99.55
SENTS_P 99.45
SENTS_R 99.65
SENTER_LOSS 4029.93

About

Repository for training spaCy-compatible sentence segmenter for Latin


Languages

Language:Python 100.0%