UniversalDependencies / UD_English-EWT

English data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistent decisions about SCONJ/ADV and mark/advmod for subordinate-clause-introducers

nschneid opened this issue · comments

Especially with "when" and "where":

image

image

image

Should all of these be SCONJ and mark in UDv2? SCONJ guidelines mention "when" introducing a clause.

Note that in GUM, "where" and "when" are always tagged SCONJ, and nearly always attach to a subordinate clause as mark or head a relative clause.

When is the trickier of the two and I can imagine that it will be more SCONJ-like in some sentences (when it alternates with if) and more ADV-like in other sentences (when the meaning is more temporal). I am not sure if a sharp boundary can be established and tested.

However, where is adverb, and in the above example it is even coreferential with the modified nominal in Sudan.

@dan-zeman If these can be ADV, would they attach to subordinate clauses as mark or advmod?

When they are ADV then they should be advmod. Similar to wh-pronouns, which attach as nsubj, obj or obl, not as mark.

@dan-zeman Should that be added to the rel-upos-mark test in the validator? I see errors for "'mark' should not be 'DET'" and "'mark' should not be 'VERB'" but none regarding ADV.

Relatedly, there are instances of instead/ADV--fixed-> of/SCONJ attaching as mark, and so/ADV attaching as mark. Maybe those are errors too.

I can confirm that the test currently specifically targets (NOUN|PROPN|ADJ|PRON|DET|NUM|VERB|AUX|INTJ), that is, it allows SCONJ with mark, but also ADP, CCONJ, ADV, PART, SYM and X (but not PUNCT, which will be caught by other tests). I don't remember whether being benevolent to adverbs was my idea, or it is a result of a later discussion here (I could imagine that people protested that border cases like when should be allowed). But I don't see any reference to such a discussion in the comments in the source code, so maybe we could try to exclude adverbs from mark – but not before UD 2.6 is released.

My knee-jerk reaction to this is to say that distinguishing temporal/conditional 'when' in the syntax tree is unrealistic, and I think that's probably the right thing to say. But paradoxically, this is actually something we could probably distinguish for GUM automatically because we have complete discourse parses distinguishing a relation condition from circumstance and sequence.

But since that's not feasible to use for the other English corpora, I'd say we should just make the interrogative ones be advmod and the subordinate ones mark. I'm happy to implement this for GUM if the consensus is that that would be better.

@amir-zeldes : But if some occurrences of when are advmod and some others mark, it is probably easy to also tag the former ADV and the latter SCONJ, hence the validator could in principle test it, right?

Yes, the tagging and deprel issue are bundled, I just meant that distinguishing the two cases in EWT is non-trivial so probably not a feasible edit.

If anyone does want to try, I'm happy to point out training resources for condition vs. not, there's data from GUM, RST-DT, possibly PDTB (I'd need to check if it's distinguished) and other resources. Just in case someone looks into this later, for GUM you can find this information for example here:

or as a query in ANNIS, something like this:

http://corpling.uis.georgetown.edu/annis/#_q=bGVtbWE9IndoZW4iIF9sXyByc2RfcmVsPSJjaXJjdW1zdGFuY2VfciI&_c=R1VN&cl=5&cr=5&s=0&l=10

http://corpling.uis.georgetown.edu/annis/#_q=bGVtbWE9IndoZW4iIF9sXyByc2RfcmVsPSJjb25kaXRpb25fciI&_c=R1VN&cl=5&cr=5&s=0&l=10

The ADV/SCONJ split is now implemented in GUM V6.1.0 and live in the UD dev branch, incl. in UD_English-GUMReddit, and in GU ANNIS. I can run the same depedit script on EWT, but I'm not sure if someone is editing the EWT dev branch concurrently.

If someone wants me to do this I can make a PR, just let me know.

@amir-zeldes I pushed some EWT fixes last night but am done now, so have at it!

OK, implemented in #90

@nschneid Can you review and merge?

@dan-zeman Should SCONJ uses of "when", "where", etc. be PronType=Int or PronType=Rel?

I would think that if they have PronType then they are ADV rather than SCONJ. (The validator will currently not complain if you use PronType with SCONJ; but if it is just a conjunction, then you are not querying a time/location, and you are not referring to a time/location in the matrix clause, so neither Int nor Rel seem to apply.)

I don't think SCONJ can't have pronoun features (in some languages, even words like 'if' inflect to agree with subordinate clause subjects), but in English I think it's neither:

  • It's not Int because it's not interrogative (it does not stand in as an equivalent for some phrase in a subsequent response)
  • It's not Rel because it doesn't correspond to a phrase in the matrix clause: "I went to France when de Gaulle was president" (the time is not mentioned in the matrix clause)

So I would say no PronType, though I don't feel strongly about it.

I have some doubts about the analysis of I went to France when de Gaulle was president. Here I think that when refers to certain time. You could also say I went to France in the time when de Gaulle was president. and then the time would be mentioned in the matrix clause. Seems elliptical, parallel to what in I give you what you need. vs. I give you the thing that/which you need.

For me, an example where when is SCONJ and not ADV would be a conditional clause: When (=if) he can become a president, then I can too.

@dan-zeman Are you saying that all temporal clauses are inherently relative because matrix clauses happen at some time?

@amir-zeldes I'm not sure I want to call them relative but I find it natural to consider the wh- thing there to be a relative adverb. On the other hand, it is difficult to judge for me as a non-native speaker. The situation in English is complicated by the fact that the same word, when, is used both in temporal and in conditional clauses (and in any grayzone cases between the two). If it were Czech, there would be two possible translations of when: kdy vs. když. The former is a temporal adverb, interrogative or relative, and cannot be used in conditional clauses. The latter is a subordinating conjunction and can be used in conditional as well as temporal clauses; I would not tag it as a relative adverb in the temporal ones. This sounds like I'm contradicting myself, but the difference is that in Czech I know that one of the words is ADV and the other one is SCONJ, while in English it is just one word and I'm looking for a way how to draw a line between ADV and SCONJ.

The analysis of when-clause as adverbial headless relative clauses has already been proposed. In French, the equivalent of when, quand, is also a qu-words as the other relative pronouns.

But the frontier between adverbial wh-words and SCONJ is very subtle and I am not sure it is possible to draw a line. In a sense, when is both an adverbial wh-word and an SCONJ, because adverbial wh-words are particular cases of SCONJ.

I agree that the line is hard to draw, but in many cases you can coordinate when with subjunctions, which may be a reason to analyse it as such:

I went to France when, or just before, de Gaulle became president
I will tell her if and when she turns up.

Would substitutability of "whenever" be a test for conditional vs. temporal?

  • I will tell her when/whenever she turns up.
  • I went to France when/?whenever de Gaulle was president
    • "whenever" kind of works if I am unsure when de Gaulle was president and want to signal this, but a different sense from "at any time that" which is closer to conditional

This distinction potentially interacts with tense/aspect; not sure if that is what we want.

I think we have two separate questions here: should it be PronType=Rel, and can we distinguish SCONJ from ADV for words like 'when' in English.

For the first, I tend to reject Rel, because I think the 'elliptical relative' argument could be applied to other cases we don't want to include, whereas normal relatives don't allow this ellipsis. For example, if we agree that 'if' is SCONJ, then I can create an ad hoc 'relative-like' example, like the 'time' example from @dan-zeman :

  • I will do it under exactly one condition: if and only if you pay me $100

In this example, the word 'condition' functions like 'time' in the other example, but I still think its correct analysis is as advcl to 'do'. Conversely, relative clauses don't generally allow ellipsis of their matrix clause head:

  • I ate a sandwich already
  • I ate already (standard omission of object of eat)
  • I ate a sandwich that I bought earlier
  • ??? I ate that I bought earlier

So at least for English, I would say the two cases are different, and that 'when' is more similar to 'if' than it is to 'that' or 'which'.

Then about the ability to distinguish POS tags (in English), I would say we should use ADV in main clause interrogatives, or reported interrogatives inside ccomp, where the analysis as SCONJ is not motivated (since English does not use complementizers for standard main clauses):

  • When/ADV will she come/root?
  • I asked when/ADV will she come/ccomp?

For the advcl case, I think we can use SCONJ:

  • I will go when/SCONJ she comes

Also note that the POS tag can actually disambiguate two constructions:

  • I will decide when/SCONJ she comes/advcl (the decision will happen at that time)
  • I will decide when/ADV she comes/ccomp (the thing being decided is the time she comes)

But if people don't agree on this, I'd much prefer to have only ADV than only SCONJ, since in a main clause SCONJ seems really wrong to me.

Fixed in #90

Reopening this because the current policy produces odd results. First, interrogative determiners are always WDT and DET, not SCONJ. Why should interrogative adverbs be different? The first result from the linked query illustrates this nicely, suggesting the bracketing:

  • opinions on [how/SCONJ/mark [it happened] and [[what/DET/det effect] Chernobyl will have]]

Whereas the structure I would assign is:

  • opinions on [[how/ADV/advmod it happened] and [[what/DET/det effect] Chernobyl will have]]

Second, "how" can modify an adjective xcomp rather than the main embedded predicate: "I can't tell you [how ominous] I found Bush's performance". It can even modify an adjective of an embedded object: "it depends on [[how good] a horse] you have".* mark doesn't attach to the main embedded predicate here, which is counterintuitive to me; advmod seems more correct.

* Moreover, in this construction "how" is the interrogative equivalent of "as", which is ADV/advmod: "I have [[as good] a horse]."

Agreed, the first "opinions" analysis is wrong for me, since I think it's a free relative. It should be:

nmod(opinions, how)
acl:relcl(how, happened)

And then it's clear that "how" is not mark. But even for correct free relative trees with WRB, the current upos is wrong, and should indeed be ADV IMO, just like "opinions on what" would be what/PRON.

Actually after reading up on free relatives I don't think it is one, I think it's an interrogative content clause. But this is a complicated discussion so let's do that in a meeting.

Sure, happy to talk more about it when we meet - FWIW I think a content clause would be acl (the opinion that...), whereas this one conflates two functions, like a "what" free relative (it's short for a separaable "opinions about the way/nmod, how..."). So we prefer the matrix function, and tag "how" as nmod IMO.

Whew, so today we decided to dispense with SCONJ/mark and always use ADV/advmod for "where", "when", "why", and "how". The best argument in my mind is a sentence like

  • I wonder how much money he has

(interrogative complement clause with predicate "has")—mark(much, how/SCONJ) would be decidedly weird, because the SCONJ usually marks the predicate itself. This brings WH words in complement clauses in line with main clauses (and relative clauses).

Implementing this change should be pretty straightforward since we are removing a putative distinction, not adding one.

Actually we had kept "how/advmod ADJ" (how much etc.) in GUM even after the shift to mark for the independent WH subordinators, so that doesn't need to change. I do wonder whether we have thought this through though: do we have an exhaustive list of the ones that should be advmod? Like, how about "once" or "while"?

I think we only were talking about WH adverbs. "While" makes sense as SCONJ because it always introduces a clause, right? "Once" can be an adverb (unlike "if", "while", "that") but I'm not sure whether that means we shouldn't call it SCONJ where it introduces a clause.

If "once" is SCONJ when introducing a clause (i.e. stands where 'if' can stand) then I don't really understand why we would want "when" to be advmod... Not that I feel really strongly about any of this (it's automatable and therefore doesn't mean much), but it feels arbitrary to me. If I'm being honest, I don't think there is a syntactic difference between "if we go" and "when we go".

Well, I don't think the ADV vs. SCONJ dichotomy for the words we're discussing is a perfect fit for English, but we're stuck with it. :) I could go either way on "once". I just want all the WH-words to be ADV so that "I wonder how you feel", "I wonder how much money you make", "I wonder where you live", "I wonder when it ends", etc. all end up the same, and match the main clause versions of those subordinate clauses ("How do you feel?", "How much money do you make?", "Where do you live?", "When does it end?").

If we wanted to be very particular we could make an argument that "once" has been grammaticalized from an adverb to a subordinator, so it should be ADV/mark in these constructions. But we haven't gone down that road (yet) for prepositions functioning as subordinators. For now probably best to say "once" is SCONJ/mark.

OK, I can live with that

BTW, not exactly related but since I'm looking at the (e)deps, did we settle on flip-flopping when/where back to advmod where they were previously mark? If so, do you have an exhaustive list of the items this applies to? I have:

when|where|whither|whence|while|why

Originally posted by @amir-zeldes in #346 (comment)

In at least some cases, yes. how, when, Xever, ...

From above I think we decided "while" is SCONJ.

Mm, so 'while' stays SCONJ/mark (in all contexts), and when|where|whither|whence|why are ADV/advmod?

Is there ever a context in which 'while' is ADV/advmod or one of the others above is SCONJ/mark?

Not that I know of? They are all WH-words except for "while".

OK, I can make this happen in GUM, but looking at EWT dev it's currently not this way. Is this change pending in EWT?

Docs are now updated: SCONJ, ADV, advmod