PlanTL-GOB-ES / utils

Miscellaneous utilities and scripts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

utils

This repository contains different scripts that provide utilities for various miscellaneous tasks that are used internally, but they could be useful for researchers and developers.

All the scripts included in this repository are independent from each other. We recommend interested users to clone the entire repository and to remove those directories they are not interested in.

Directory structure

Each directory contains an specific script for some specific actions. Here's a brief explanation for each tool:

Abbreviation_Extractor/
This script is used to extract abbreviations and definitions from biomedical texts written in Spanish, by detecting 
abbreviations and their potential definitions explicitly mentioned in the same sentence.

AnnotatorJS_2_Brat/
This script extracts the annotations made using the AnnotatorJS library and converts them into Brat format. 
AnnotatorJS' annotations are stored in Json format and the converter builds ANN files from them.

FixEncodingErrors/
This script makes use of the sed utility to fix some of the most common encoding problems in corpora.

FreeLing_2_brat/
This script converts FreeLing tabular output format into BRAT standoff format.

License

See LICENSE file in each folder.

About

Miscellaneous utilities and scripts.


Languages

Language:Java 52.6%Language:Python 34.2%Language:Shell 8.5%Language:Perl 4.8%