Scripts and data from the paper:
Ceolin, A., Guardiano, C., Longobardi, G., Irimia, M. A., Bortolussi L., & Sgarro A. (2021). At the Boundaries of Syntactic Prehistory. Philosophical Transactions of the Royal Society B. 376: 20200197.
The code contained in this repository is licensed under MIT license. For the figures and the dataset, please refer to the journal policies.
The repository contains the following files:
-
table.txt: this file contains the name of the languages and their 94 associated features (with values
+, -, 0
), separated by a tab. -
Table1.csv: this file contains some information about the languages of the dataset.
-
Table2.pdf: this file contains the dataset in PDF format.
-
dist.py: this Python3 script takes the file
table.txt
as input and printsTableS3
, which contains the Jaccard distances in matrix format. The syntax ispython3 dist.py table.txt
. Note that features where either language shows a '0' are ignored for the purposes of the distance computation. -
TableS3: this is the output of
python3 dist.py table.txt
. -
coord.txt: this file contains the geographic coordinates of the populations sampled.
-
gcd.py: this script is used to print Great Circle Distances from
coord.txt
, by simply callingpython3 gcd.py
The output file isTableS5
. -
TableS5: this is the output of
python3 gcd.py
. -
checker.py: a tool to check whether the values in the dataset are compatible with the structural implications. It can be called with
python3 dist.py table.txt
and it returns a file callederrors_table.txt
with a list of errors. -
Artificial_Langs: this folder contains the material used for the generation of the Artificial Languages.
-
GCD: this folder contains the material used for the syntax/geography correlation.
-
Figures: this folder contains the material used to generate the figures in the paper.
-
Supplementary_Information: this folder contains the material used to generate the figures in Supplementary Information.