UniversalDependencies / docs

Universal Dependencies online documentation

Home Page:http://universaldependencies.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Turkic aorist

dan-zeman opened this issue · comments

What is Tense=Aor supposed to mean? It is not documented in the Turkish documentation of Tense. It occurs in two Turkish treebanks (BOUN and PUD) while it is absent from the other five. It also occurs in Kazakh and Uyghur.

Following Göksel and Kerslake (2005; page 548), the aorist is “a finite verb form marked by the suffix -(A/I)r (or its negative counterpart -z); the aorist expresses either habitual aspect or various kinds of modality: generalizing, hypothetical, presumptive (with future time reference) or volitional”.

So it does not seem to be a Tense value at all. At least not in the UD sense — although the term is used to denote a tense in the traditional grammar of some other languages, such as Bulgarian. Can we get rid of it in Turkic? Right now it is reported as a validation error but that could be also mitigated by documenting it as a language-specific value, if needed. However, my impression is that it could be encoded using Aspect=Hab. What do the other Turkish treebanks do with these forms?

In general most treebanks follow the standard Turkological use of Aorist here. In practice it's more like "non past" and has little to do with the aorist in e.g. Greek or Bulgarian, which is a past tense. I'm not sure if Aspect=Hab is right, but @jonorthwash might have a better idea.

I do not think "aorist" is a tense (and it is a very unfortunate name, as @ftyers noted, it does not have any relation to better-known usages of the term from Green and Bulgarian either).

In Turkish the suffix called aorist is an aspect and mood marker. It marksAspect=Hab, and also has a modal effect, some of the Turkish treebanks mark with Mood=Gen (following Göksel & Kerslake 2005, "generalized modality", this is the way to express general facts, rules etc.). I am not sure if these two are true for all use cases, but the morhological feature these treebanks mark as Tense=Aor, has nothing to do with Tense.

My position is not using Tense=Aor at all. It is not a tense marker, and it is not aorist as most people know it. I also tried to bring this up earlier in #191.

In most Turkic languages the "aorist" can mark Aspect=Hab, but it can also mark gnomic aspect (Mood=Gen, basically), present tense, and future tense.

Cf. the following sentences in Kyrgyz using aorist:

  • Мен дүкөнгө барам. "I will/do go to the store." (FUT/HAB ambiguous)
  • Бул кино мага жагат. "I will/do like this movie." (FUT/GNO ambiguous)
  • Жамгыр жаайт. "It will/does rain." (FUT/HAB/GNO ambiguous)

I guess given the right context and adverbials they're all FUT/HAB/GNO ambiguous.

You can also put non-past on auxiliaries:

  • Жамгыр жаап жатат. "It is raining." (the auxiliary adds progressive, aorist adds PRES)
  • Жамгыр жаай баштайт. "It will/does start raining." (the auxiliary adds inchoative, aorist adds FUT/HAB)
  • Мен жаза алам. "I can / will be able to write." (auxiliary adds capability, aorist adds GNO/FUT)

So I do not think replacing all occurrences of it with Aspect=Hab would be right. I see three options:

  1. We could go back through all aorist and try to figure out which meaning was intended based on context,
  2. We could just leave Tense=Aor and change the validation tests,
  3. We could make a new value for tense for "non-past".

I vote strongly against option 1. It would be painstaking at best, and also nearly impossible to get right. It also has nothing to do with how these languages actually work—it's just making up some value based on some other language. Kyrgyz is colonised enough, thank you.

The last two options seem about equivalent to me.

At least for Turkish, I am still strongly against Tense=Aor.

The reason I dislike Tense=Aor is the fact that the suffix (at least in Turkish) does not indicate tense. This is true for some other TAME suffixes as well (e.g., the progressive -(I)yor). The so-called aorist suffix can combine with past tense markers, as in:

  • Orada çok yağmur yağar "it rains a lot there" (repeatedly, and as a general rule/fact)
  • Orada çok yağmur yağar "it used to rain a lot there" (only difference is now the habitual/general action now in the past)

The first one is clearly in present tense (with little potential ambiguity for future in Turkish), and the second one is in past tense. In my opinion, the tense is determined by the existence or non-existence of past tense suffixes on the verb, -DI above. The "aorist" suffix does not affect the tense, but add some aspect/mood. So, having a (universal) Tense=Aor does not make sense to me. If we want to copy over the suffixes into the features, I think Aspect or Mood is a more appropriate place to indicate what this suffix does.

We could come up with a standard way of marking this particular suffix, but I can also live with leaving some ambiguity in Aspect/Mood marking, or somehow marking the "most likely" option for Aspect and Mood. Although I like the idea of being able to interpret TAME based on morphological features, these features are rarely relevant to syntax (but they may help disambiguate certain sentences), and there are cases that we will not be able to disambiguate tense anyway. For example,

  • Orada çok yağmur yağarmış "(they say) (it used to) rain a lot there"

is similar to above examples, but ambiguous between past and present. What is clear from both suffixes is they provide Aspect/Mood (-ar) and Evident (-mış). Without exra-sentential context, it is not clear if this is past or present (or future). I'd be comfortable to mark this as Tense=Past|Aspect=Hab|Mood=Gen|Evident=Nfh, which is the most likely case. Or maybe leave tense ambiguous (Pres,Past) or unspecified, but Tense=Aor is the least informative, and most non-standard choice.

Orada çok yağmur yağar "it used to rain a lot there" (only difference is now the habitual/general action now in the past)

Most Turkic languages can't add (some subset of) tense suffixes after the "aorist" suffix, so this is not among the examples I was considering.

I would not consider this example aorist at all, despite having what appears to be an aorist suffix in the TAMVE marking. I would instead say it's unambiuously Tense=Past|Aspect=Hab, as you say, and should be marked that way.

The question I think can be narrowed to the forms marked only with what's labelled "aorist" in a given language. Different Turkic languages slice the relevant TAMVE space in different ways. I started to put together a mini-typology of Turkic "aorist" semantics some time ago that demonstrates this point. One thing that's starting to be clear from it (albeit a very small slice of languages and forms) is that gnomic and habitual don't tend to be distinguished in Turkic languages, and also that different Turkic languages do things differently.

In most Turkic languages the "aorist" can mark Aspect=Hab, but it can also mark gnomic aspect (Mood=Gen, basically), present tense, and future tense.
...
So I do not think replacing all occurrences of it with Aspect=Hab would be right. I see three options:

  1. We could go back through all aorist and try to figure out which meaning was intended based on context,
  2. We could just leave Tense=Aor and change the validation tests,
  3. We could make a new value for tense for "non-past".

I vote strongly against option 1. It would be painstaking at best, and also nearly impossible to get right. It also has nothing to do with how these languages actually work...

I also do not think there should be multiple annotations based on context. If it is one morphological form and not two that incidentially look the same, it should receive one set of features. The features should describe the prototypical usage, even if there are counterexamples where the reading is different from what the features suggest.

The universal feature documentation actually mentions a non-past tense (but the reference from there to Turkic aorist was not there from the beginning; it was added in 2020). The recommendation there is to reuse Tense=Pres for this purpose. This is certainly not a perfect solution but I think it has some advantages over Tense=Aor. It at least overlaps with the tenses covered by the Turkic aorist, and it is an existing feature value, so one does not have to define a language-specific label. It also does not prevent us from adding other features (Aspect, Mood), if they contribute to a better understanding of the form. Mood=Gen is also language-specific, but at least it has already been documented. And it is used in all the four Turkish treebanks that were released in UD 2.7 (BOUN, GB, IMST, PUD).

Closing the issue. As of UD 2.8, Tense=Aor is not used anywhere in UD.