adobe / NLP-Cube

Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing

Home Page:http://opensource.adobe.com/NLP-Cube/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CONLL 2018 Shared Task participation

dumitrescustefan opened this issue · comments

We should participate to this year's shared task: Multilingual Parsing from Raw Text to Universal Dependencies ( http://universaldependencies.org/conll18/ )

This means we need to prepare the code & scripts for this task.

As far as I can see right now, we have three distinct cases:

  • we have end-to-end decoding
  • we start from udpipe tokenization and perform tagging, parsing, lemmatization
  • we start from udpipe tokenization, tagging and lemmatization and we perform parsing

TODO

  • update the runtime CLI to support taking as input a list of models that need to pe run on the input. for instance: --run=[tokenization, parsing, tagging, lemmatization] or --run=[parsing, tagging]
  • implement scripts tailored for the UD Shared Task, which take as input the supplied XML with the list of input test files and run a custom NLPCube pipeline, depending on the language code
  • deploy on TIRA testing environment

Expected Result

Training and evaluation scripts, models trained, handling of low-resourced minor languages.