CONLL 2018 Shared Task participation
dumitrescustefan opened this issue · comments
Stefan Dumitrescu commented
We should participate to this year's shared task: Multilingual Parsing from Raw Text to Universal Dependencies ( http://universaldependencies.org/conll18/ )
This means we need to prepare the code & scripts for this task.
As far as I can see right now, we have three distinct cases:
- we have end-to-end decoding
- we start from udpipe tokenization and perform tagging, parsing, lemmatization
- we start from udpipe tokenization, tagging and lemmatization and we perform parsing
TODO
- update the runtime CLI to support taking as input a list of models that need to pe run on the input. for instance:
--run=[tokenization, parsing, tagging, lemmatization]
or--run=[parsing, tagging]
- implement scripts tailored for the UD Shared Task, which take as input the supplied XML with the list of input test files and run a custom NLPCube pipeline, depending on the language code
- deploy on TIRA testing environment
Expected Result
Training and evaluation scripts, models trained, handling of low-resourced minor languages.