xurxodiz / raposa

Lexicological framework for pipeline text processing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RAPOSA

Lexicological framework for pipeline text processing

Description

RAPOSA processes texts word by word and applies different filters in a conveyor-belt like fashion.

Define a pipeline, with its tokenization method, and the different tubes through which the tokens will travel. Tubes may modify the token, discard it, tag it, or any combination of those three. Some basic pipelines and tubes are included, but every case is different, so customization was the key guiding principle. As such, we encourage to check the demo.py file and the code itself to know how to create and combine your own derived classes.

The intended use case for RAPOSA is lexicology analysis, being of special convenience for neology, lexicography and morphology, but its open-endedness and customization allow for many different kinds of purposes. For this reason, it also includes many other NLP/CompLing goodies.

RAPOSA is not tied to any specific language, though currently it may only contain filters for some languages due to obvious time development constraints. RAPOSA is specially proud to support minorized and minority languages.

Contributions are always warmly welcome and appreciated!

Use

Define a pipeline, with its tokenization method, and the different tubes through which the tokens will travel. Tubes may modify the token, discard it, tag it, or any combination of those three. Some basic pipelines and tubes are included, but every case is different, so customization was the key guiding principle. As such, we encourage to check the demo.py file and the code itself to know how to create and combine your own derived classes.

As the software is under development, look at demo.py for examples until proper docs are in place.

Patronage

The initial version of this package has been developed under a research scholarship from the Deputación da Coruña for the year 2016.

License

The software is released under a MIT License (see LICENSE file in the root folder for details), except for the following resources, which are derivative work:

About

Lexicological framework for pipeline text processing.

License:MIT License


Languages

Language:Python 100.0%