cldf / pyigt

Since the distinction is made between grammatical and lexical morphemes, a gloss like 1 or 1SG (not tested) or 1>3 should be categorized as grammatical, just like ERG. I noticed it in the results of corpus.get_wordlist(); I am not sure if this categorization happens elsewhere.

Depends on your perspective. For me, as one who wants to pull out a Swadesh list of the Qiang data, I count them as non-grammatical.

What counts as grammatical and was not depends on a regex, which can be configured:

pyigt/src/pyigt/igt.py

Lines 43 to 44 in ed75e26

    
           def is_grammatical_gloss_label(self, s): 
        
               return bool((s in ABBRS) or self.label_pattern.match(s))

As you can do by triggering the label_pattern:

pyigt/src/pyigt/igt.py

Line 36 in ed75e26

label_pattern = attr.ib(default=re.compile('^([A-Z]+|([1-3](DL|PL|SG)))$'))

If 1SG is matched, but 1sg would not be matched, 1 would not be matched (for good reasons, as it can even be a number).

Depends on your perspective. For me, as one who wants to pull out a Swadesh list of the Qiang data, I count them as non-grammatical.

…right, I hadn't considered that perspective :)

What counts as grammatical and was not depends on a regex, which can be configured

So do we modify label_pattern in the igt.py file? Or where in the workflow would that happen?

It depends on your application. If you want to follow the cldf-schema, with the commands interface (which is quite comfortable, but of course tedious, if you do it for 50 datasets), you would modify your params there. So you could specify in this line of the cldf-example-repo for lapollaqiang: https://github.com/cldf-datasets/lapollaqiang/blob/40bcba31a65b675a15d2dcac5fae7901619162fc/commands/workflow.py#L13-18 Inside Python scripts you can of course also use it more freely.

Aaah, so I would pass a custom CorpusSpec instance like so?

class MyCorpusSpec(object):
    …
    label_pattern = attr.ib(default=re.compile('^([A-Z]+|([1-3](DL|PL|SG)))$'))
    …

text = Corpus.from_cldf(ds.cldf_reader(), spec=MyCorpusSpec)

	def is_grammatical_gloss_label(self, s):
	return bool((s in ABBRS) or self.label_pattern.match(s))

Morphemes glossed with numbers should be grammatical, not lexical morphemes