clics / clicsbp

CLDF dataset on Body Part Colexifications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLDF dataset on Body Part Colexifications

How to cite

If you use these data please cite this dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license.

Notes

The repository includes:

  • data in CLDF
  • commands to compile the colexifications
  • inventory of concepts, Lexibank datasets, and language families
  • table with forms
  • output including graphs, plots, and degrees
  • raw data folder for local clones of the Lexibank repositories
  • scripts for the analyses of ARI and degree values

Installation

To run the code, you need to install the following. We recommend to use a fresh virtual environment.

Clone the GitHub repository in a local folder:

$ git clone https://github.com/clics/clicsbp.git

Change the directory to clicsbp and run:

$ pip install -e .

Check if everything worked by typing cldfbench -h. If you see commands starting with clicsbp., you will be able to run the following code.

In addition, you need the packages from our other reference catalogs (Glottolog, Concepticon, CLTS) and the pyclics package. Make sure to install pyclics by downloading the GitHub repository, checking out the branch colexifications, and then installing the package with pip install -e ..

Download the Lexibank datasets

This command downloads the Lexibank datasets in the local raw folder.

$ cldfbench download lexibank_clicsbp.py

Create CLDF Dataset

To create the CLDF dataset with the colexifications aggregated from the Lexibank word lists, use:

$ cldfbench lexibank.makecldf lexibank_clicsbp.py --concepticon-version=v3.1.0 --clts-version=v2.2.0 --glottolog-version=v4.8

Note that the versions of the reference catalogs change and might need to be adapted in the future.

Colexifications and Analyses

Now,you are able to compute the colexifications and perform the analysis. First, run:

$ cldfbench clicsbp.colexifications
$ cldfbench clicsbp.colexify_all_data

Compute Coverage

Calculate the coverage of the data with:

$ cldfbench clicsbp.coverage

Compute ARI, AMI, and BCubes Statistics

The values can be created for each of the semantic domains by typing:

$ cldfbench clicsbp.ari --tag "human body part"
$ cldfbench clicsbp.ari --tag "emotion"
$ cldfbench clicsbp.ari --tag "color"

Compute Degrees

Examine the degree distributions across, for example, language weight with:

$ cldfbench clicsbp.degrees --weight "language"

Plot Graphs

To create images of the networks with body part colexifications for each of the 20 language families, use:

$ cldfbench clicsbp.plotgraphs --weight=Cognate_Count_Weighted --tag="human body part"

Plot Pie-Charts

The cognitive relations associated with body part colexifications can be explored by creating pie-charts of the data.

$ cldfbench clicsbp.piecharts --weight=Language_Count_Weighted

Statistics

Glottolog: 100% Concepticon: 100% Source: 0% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 1,028
  • Concepts: 1,500
  • Lexemes: 662,159
  • Sources: 0
  • Synonymy: 1.13
  • Invalid lexemes: 0
  • Tokens: 3,861,880
  • Segments: 1,390 (0 BIPA errors, 0 CLTS sound class errors, 1382 CLTS modified)
  • Inventory size (avg): 44.04

Possible Improvements:

CLDF Datasets

The following CLDF datasets are available in cldf:

About

CLDF dataset on Body Part Colexifications

License:Creative Commons Attribution 4.0 International


Languages

Language:Python 89.6%Language:R 10.4%