TexNLP

TexNLP: Texas Natural Language Processing tools

This is the site for the TexNLP code used in the following papers:

Jason Baldridge. 2008. Weakly supervised supertagging with grammar-informed initialization. In Proceedings of COLING-2008. Manchester, UK. PDF
Jason Baldridge and Alexis Palmer. 2009. How well does active learning actually work? Time-based evaluation of cost-reduction strategies for language documentation. In Proceedings of EMNLP-09. Singapore. PDF
Alexis Palmer, Taesun Moon, Jason Baldridge, Katrin Erk, Eric Campbell, and Telma Can. 2010. Computational strategies for reducing annotation effort in language documentation: A case study in creating interlinear texts for Uspanteko. Linguistic Issues in Language Technology. 3(4):1-42. PDF

The code supports supervised and semi-supervised learning for Hidden Markov Models for tagging, and standard supervised Maximum Entropy Markov Models (using the TADM toolkit). There is additional support for working with categories of Combinatory Categorial Grammar, especially with respect to supertagging for CCGbank.

Please reference Baldridge (2008) if you use this software. Please note that it is not user-friendly and is poorly documented – please email Jason Baldridge (jbaldrid@mail.utexas.edu) if you have questions about getting things working.

Download: TexNLP v0.2.0

License: LGPL

Contributors: Jason Baldridge, Taesun Moon, Elias Ponvert

This development of the software and the research behind it was done as part of the EARL project, supported under NSF grant No. 06651988, "Reducing Annotation Effort in the Documentation of Languages using Machine Learning and Active Learning."

About

TexNLP: Texas Natural Language Processing tools

GNU Lesser General Public License v3.0

Languages

Language:Java 86.6%Language:Shell 7.1%Language:Python 6.3%