MuffinLinwist / sidwellvietic

CLDF dataset derived from Sidwell and Alves' "Vietic Lexicon" from 2021

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLDF dataset derived from Sidwell and Alves' "Vietic Lexicon" from 2021

How to cite

If you use these data please cite

  • the original source

    Sidwell, Paul, & Alves, Mark. (2021). Vietic 116 item phylogenetic lexicon (First version (26 Aug 2021)eng) [Data set]. 9th International Conference on Austroasiatic Linguistics (ICAAL 9), Lund, Sweden. Zenodo. https://doi.org/10.5281/zenodo.5263195.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://zenodo.org/record/5263195

Notes

The data used here was extracted from the Excel file shared on Zenodo (https://zenodo.org/record/5263195). The file also contains the nexus file which the authors derived, which was separated here.

When preparing the dataset, Proto-Vietic is treated as a language on its own rights, but since cognate sets are not assigned from Proto-Vietic forms, these are left empty. As a result, the data cannot be used directly to check, for example, for the correctness of automatic linguistic reconstructions.

Statistics

Glottolog: 100% Concepticon: 100% Source: 76% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 33
  • Concepts: 116
  • Lexemes: 3,218
  • Sources: 19
  • Synonymy: 1.00
  • Cognacy: 3,218 cognates in 654 cognate sets (332 singletons)
  • Cognate Diversity: 0.17
  • Invalid lexemes: 0
  • Tokens: 12,661
  • Segments: 449 (0 BIPA errors, 0 CTLS sound class errors, 448 CLTS modified)
  • Inventory size (avg): 54.82

Possible Improvements:

  • Entries missing sources: 760/3218 (23.62%)

Contributors

Name GitHub user Description Role
Paul Sidwell Author
Mark Alves Author
Johann-Mattis List @LinguList CLDF conversion Other

CLDF Datasets

The following CLDF datasets are available in cldf:

About

CLDF dataset derived from Sidwell and Alves' "Vietic Lexicon" from 2021

License:Creative Commons Attribution 4.0 International


Languages

Language:TeX 67.7%Language:Python 32.3%