proycon / pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Home Page:https://pypi.python.org/pypi/PyNLPl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[folia] Validator does not complain when multiple t attributes of the same class are defined!

proycon opened this issue · comments

foliavalidator should raise an exception. folialint does show correct behaviour:

$ foliavalidator Knaagtandje_.folia.xml
Validated successfully: Knaagtandje_.folia.xml

$ folialint Knaagtandje_.folia.xml
FAIL: XML error: attempt to add <t> with class=current to element: Knaagtandje_.id61 which already has a <t> with that class

This is an issue for the BASILEX corpus where multiple elements have been incorrectly put under paragraphs; curation needed.