hmosousa / tei2go

Fast multilingual temporal expression identification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Temporal Expression Identification to Go

Paper License

HuggingFace German HuggingFace English HuggingFace Spanish HuggingFace Italian HuggingFace French HuggingFace Portuguese

Temporal Expression Identification to Go (TEI2GO) is an approach for fast and effective identification of temporal expressions. Currently, TEI2GO has models for six languages:

  • German
  • English
  • Spanish
  • Italian
  • French
  • Portuguese

However, it can be expanded to other languages. If you intend to expand it to another language feel free to create an issue, fork the repo, and do a pull request.

🤗 HuggingFace Hub

To facilitate the usage, all TEI2GO models were published on HuggingFace Hub. The code below demonstrates how one can load the French model:

On the command line, run:

pip install https://huggingface.co/hugosousa/fr_tei2go/resolve/main/fr_tei2go-any-py3-none-any.whl

Then the model can be loaded in two ways:

  1. Using Spacy
import spacy
nlp = spacy.load("fr_tei2go")
  1. Importing as a module
import fr_tei2go
nlp = fr_tei2go.load()

Development environment

virtualenv venv --python=python3.8
source venv/bin/activate
pip install -r requirements.txt

To assert that everything is working run pytest: python -m pytest tests

Train

python -m src.run spacy  --data tempeval_3 ph_english --language en

Download Pre-Trained Models

cd models
sh download.sh

Download Resources

cd resources
sh download.sh

Meta

Hugo Sousa - hugo.o.sousa@inesctec.pt

This framework is part of the Text2Story project which is financed by the ERDF – European Regional Development Fund through the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 and by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project PTDC/CCI-COM/31857/2017 (NORTE-01-0145-FEDER-03185)

Cite

If you use this work, please cite the following paper:

@inproceedings{10.1145/3583780.3615130,
    author = {Sousa, Hugo and Campos, Ricardo and Jorge, Al\'{\i}pio},
    title = {TEI2GO: A Multilingual Approach for Fast Temporal Expression Identification},
    year = {2023},
    isbn = {9798400701245},
    publisher = {Association for Computing Machinery},
    url = {https://doi.org/10.1145/3583780.3615130},
    doi = {10.1145/3583780.3615130},
    booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
    pages = {5401–5406},
    numpages = {6},
    keywords = {temporal expression identification, multilingual corpus, weak label},
    location = {Birmingham, United Kingdom},
    series = {CIKM '23}
}

About

Fast multilingual temporal expression identification.

License:Other


Languages

Language:Python 52.5%Language:Jupyter Notebook 47.5%