nicolay-r / AREkit

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML

Home Page:https://nicolay-r.github.io/arekit-page/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AREkit 0.25.0

AREkit (Attitude and Relation Extraction Toolkit) -- is a python toolkit, devoted to document level Attitude and Relation Extraction between text objects from mass-media news.

Description

This toolkit aims at memory-effective data processing in Relation Extraction (RE) related tasks.

Figure: AREkit pipelines design. More on ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction paper

In particular, this framework serves the following features:

  • pipelines and iterators for handling large-scale collections serialization without out-of-memory issues.
  • 🔗 EL (entity-linking) API support for objects,
  • ➰ avoidance of cyclic connections,
  • 📏 distance consideration between relation participants (in terms or sentences),
  • 📑 relations annotations and filtering rules,
  • *️⃣ entities formatting or masking, and more.

The core functionality includes:

  • API for document presentation with EL (Entity Linking, i.e. Object Synonymy) support for sentence level relations preparation (dubbed as contexts);
  • API for contexts extraction;
  • Relations transferring from sentence-level onto document-level, and more.

Installation

pip install git+https://github.com/nicolay-r/AREkit.git@0.25.0-rc

Usage

Please follow the tutorial section on project Wiki for mode details.

How to cite

A great research is also accompanied by the faithful reference. if you use or extend our work, please cite as follows:

@inproceedings{rusnachenko2024arelight,
  title={ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction},
  author={Rusnachenko, Nicolay and Liang, Huizhi and Kolomeets, Maxim and Shi, Lei},
  booktitle={European Conference on Information Retrieval},
  year={2024},
  organization={Springer}
}

About

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML

https://nicolay-r.github.io/arekit-page/

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%