Summary
The Macedonian-MTB treebank is a collection of annotated sentences based on the raw monolingual corpus called Macedonian Language Digital Resources (MLDR).
Introduction
The Macedonian-MTB treebank is a collection of annotated sentences based on the raw monolingual corpus called Macedonian Language Digital Resources - MLDR, a.k.a 135 Volumes of Macedonian Literature, published by the Macedonian Academy of Sciences and Arts under the CC Attribution-NonCommercial 4.0 International License. The treebank consists mainly of literary and a few non-fiction texts.
- A description of the treebank and its origin (creation method, data sources, etc.)
- A description of how the data was split into training, development and test sets
- If there are multiple genres/domains, can they be told apart by sentence ids? Does the treebank consist of complete documents, or just randomly shuffled sentences?
- Acknowledgments and references that should be cited when using the treebank
- A changelog section for treebanks that will be released for the second (or subsequent) time. ...
Acknowledgments
...
References
Changelog
- 2023-11-15 v2.13
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.13 License: CC BY-SA 4.0 Includes text: yes Genre: grammar-examples Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Cvetkoski, Vladimir Contributing: here Contact: cvetkoski@flf.ukim.edu.mk ===============================================================================