UniversalDependencies / UD_Bambara-CRB

Bambara data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Summary

The UD Bambara treebank is a section of the Corpus Référence du Bambara annotated natively with Universal Dependencies.

Introduction

Bambara (also known as Bamana) is the most widely-spoken language of the Manding language group (Niger-Congo > Mande > Western Mande). It is spoken mainly in Mali by 13-14 million people; of these, around four million are L1 speakers. Development of the Bambara Reference Corpus was started in April 2012 (Vydrin 2013, Maslinsky 2014). The corpus includes a non-disambiguated sub-corpus and a disambiguated one. At present, the whole corpus contains about nine million tokens. The corpus was annotated using UD Annotatrix annotation tool (Tyers, Sheyanova, Washington 2018).

Documentation for the treebank is available on the UD web site.

Acknowledgments

The conversion and annotation has been done by Katya Aplonova and Francis M. Tyers at the Higher School of Economics in Moscow. We would like to thank the developers and annotators of the Corpus Référence du Bambara for permission to base this on their work.

Citation

If you use this corpus in your research please cite

@inproceedings{aplonova_2018,
author = {Aplonova, K. and Tyers, F. M.},
title = {Towards a dependency treebank for Bambara},
booktitle = {Proceedings of the 16th Conference on Treebanks and Linguistic Theories},
pages = {138--146},
year = 2018
}

References

  • Maslinsky, K. (2014). Daba: a model and tools for Manding corpora. In Proceedings of TALAf 2014 : Traitement Automatique des Langues Africaines, pages 114-122.
  • Tyers, F. M., Sheyanova, M., and Washington, J. N. (2018). UD Annotatrix: An annotation tool for Universal Dependencies. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories.
  • Vydrin, V. (2013). Bamana reference corpus (BRC). Procedia - Social and Behavioral Sciences, 95, pages 75–80.

Changelog

  • 2022-11-15 v2.11
    • Fixed validation errors.
  • 2021-05-15 v2.8
    • Normalized lemmatization of auxiliaries.
  • 2019-05-15 v2.4
    • Normalized Unicode.
  • 2018-11-15 v2.3
    • Initial release in Universal Dependencies.

=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.3 License: CC BY-SA 4.0 Includes text: yes Genre: nonfiction news Lemmas: converted from manual UPOS: converted from manual XPOS: manual native Features: converted from manual Relations: converted from manual Contributors: Aplonova, Katya; Tyers, Francis Contributing: here Contact: aplooon@gmail.com

About

Bambara data.

License:Other