Empty features `ud` and `catib`
mirkovogel opened this issue · comments
The following observation concerns the LREC-Coling 2024 release (camel_morph/official_releases/lrec-coling2024_release/databases/camel-morph-msa
The features catib6
and ud
are always empty, e.g. in the following analysis of "فبسبب":
'bw': 'فَ/CONJ+بِ/PREP+سَبَب/NOUN+ِ/CASE_DEF_GEN',
'ud': '',
'catib6': ''
The expected values are:
'ud': 'CCONJ+ADP+NOUN ',
'catib6': 'PRT+PRT+NOM'
Comment from @christios by mail:
As you've rightly pointed out, ud and catib are missing as we did not include those in the release (it was not our focus). But you are right they should be included in the next release. It should not be very difficult, probably just a mapping between the CAPHI POS (or Catib) and UD.
I am currently working on transitioning my pipeline to from the r13 morphological db to Camel Morph MSA, and need both catib6 and ud tags downstream, So I'd volunteer to help with this, if I can.
Maybe there already is code to convert between the "native" pos tags of the database (https://camel-tools.readthedocs.io/en/latest/reference/camel_morphology_features.html?) to other tag sets, I could use in the meantime?