emiltj / DaCy-3.0.0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸͺ spaCy Project: Train spaCy transformer for Danish

This project template lets you train a Named-Entity Recognition model on the DANSK dataset. It takes care of downloading the corpus and trains and evaluates the model. The template uses one of more of the transformer models which have been downloaded via Huggingface:

  • "jonfd/electra-small-nordic"
  • "NbAiLab/nb-roberta-base-scandi",
  • "KennethEnevoldsen/dfm-bert-large-v1-2048bsz-1Msteps"ßß

You can run from yaml file using spacy project run WORKFLOW/COMMAND

πŸ“‹ project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
fetch_assets Downloads DANSK to assets/
split_dansk Splits DANSK into train, dev, test
train Trains test DaCy model
evaluate Evaluate the test model on the test.spacy and save the metrics
package Package the test trained model so it can be installed
publish Publish test package to huggingface model hub.
train_all_models Trains DaCy models of small, medium and large
evaluate_all_models Evaluate all models on the test.spacy and save the metrics
package_all_models Package all trained models so they may be installed
publish_all_models Publish all model packages to huggingface model hub.
generate_readme Auto-generates a README.md with a project description.
clean Remove intermediate files

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
train_eval_pack_publ train β†’ evaluate β†’ package β†’ publish
all_models_train_eval_pack_publ train_all_models β†’ evaluate_all_models β†’ package_all_models β†’ publish_all_models

πŸ—‚ Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/ Local

About


Languages

Language:Python 97.8%Language:Shell 2.2%