amrisi / amr-guidelines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ms-amr

timjogorman opened this issue · comments

MS-AMR format and release

I've got most of the MS-AMR release ready -- all completed files have been checked against the latest snapshot from Ulf, and checked for a range of issues - overlapping identity chains, changes in what they refer to, and even checking against the speaker IDs I've extracted from the ERE data to make sure that chains with "i" are consistent. If anyone has ideas for additional things to test, let me know!

I've had a format I've been using for a few months, but have been trying to hammer out an easy, interpretable format for this. A given document would have a simple name like "msamr-dfb-023.gold.xml", and have two sections. The first would be a decaration of what the "document" is -- a list of the AMRs in a document, and the speaker and post IDs when available:

   <sentences annotator="anno7" docid="408dff173c599256711f23238e280c15" end="p53" site="LDC" sourcetype="DEFT" start="p49" threadid="bolt-eng-DF-200-192448-6191965">
      <amr id="bolt-eng-DF-200-192448-6191965_0049.1" order="0" post="p49" speaker="jb9191" su="1"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.2" order="1" post="p49" speaker="jb9191" su="2"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.3" order="2" post="p49" speaker="jb9191" su="3"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.4" order="3" post="p49" speaker="jb9191" su="4"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.5" order="4" post="p49" speaker="jb9191" su="5"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.6" order="5" post="p49" speaker="jb9191" su="6"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.7" order="6" post="p49" speaker="jb9191" su="7"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.8" order="7" post="p49" speaker="jb9191" su="8"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.9" order="8" post="p49" speaker="jb9191" su="9"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.10" order="9" post="p49" speaker="jb9191" su="10"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.11" order="10" post="p49" speaker="jb9191" su="11"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.12" order="11" post="p49" speaker="jb9191" su="12"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.13" order="12" post="p49" speaker="hollyone" su="13"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.14" order="13" post="p49" speaker="hollyone" su="14"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.1" order="14" post="p50" speaker="RNBen" su="15"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.3" order="15" post="p50" speaker="xnatalie01x" su="17"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.4" order="16" post="p50" speaker="xnatalie01x" su="18"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.5" order="17" post="p50" speaker="xnatalie01x" su="19"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.1" order="18" post="p51" speaker="Huskaris" su="21"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.2" order="19" post="p51" speaker="Huskaris" su="21"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.3" order="20" post="p51" speaker="Huskaris" su="22"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.4" order="21" post="p51" speaker="Huskaris" su="23"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.5" order="22" post="p51" speaker="Huskaris" su="24"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.6" order="23" post="p51" speaker="Huskaris" su="25"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.7" order="24" post="p51" speaker="Huskaris" su="26"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0052.1" order="25" post="p52" speaker="NeoNerd" su="27"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0052.2" order="26" post="p52" speaker="ed46" su="28"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0053.1" order="27" post="p53" speaker="Arielle" su="29"/>
   </sentences>

Then the identity chains are just explicitly marked as links between variables in each AMR document:

   <relations>
      <identity>
         <identchain relationid="rel-0">
            <mention concept="government-organization" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="g">Protection_Command</mention>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0049.3" parentconcept="take-01" parentvariable="t"/>
         </identchain>
         <identchain relationid="rel-1">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.10" variable="p2"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0051.1" variable="t6"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.4" variable="p"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0051.7" variable="t2"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0051.2" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.11" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.5" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.12" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.3" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0051.6" variable="p"/>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0052.2" parentconcept="attack-01" parentvariable="a"/>
         </identchain>
         <identchain relationid="rel-2">
            <mention concept="hole" id="bolt-eng-DF-200-192448-6191965_0049.10" variable="h"/>
            <mention concept="country" id="bolt-eng-DF-200-192448-6191965_0049.9" variable="c"/>
         </identchain>
         <identchain relationid="rel-3">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.7" variable="p"/>
            <mention concept="police" id="bolt-eng-DF-200-192448-6191965_0049.6" variable="p"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0049.8" variable="t2"/>
         </identchain>
         <identchain relationid="rel-4">
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.11" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.5" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.10" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.7" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.6" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.12" variable="i"/>
         </identchain>
         <identchain relationid="rel-5">
            <mention concept="way" id="bolt-eng-DF-200-192448-6191965_0049.3" variable="w"/>
            <mention concept="route" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="r2"/>
         </identchain>
         <identchain relationid="rel-6">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="p3">Camilla_Duchess_of_Cornwall</mention>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.13" variable="p">Camilla_Duchess_of_Cornwall</mention>
            <mention concept="she" id="bolt-eng-DF-200-192448-6191965_0049.14" variable="s"/>
         </identchain>
         <identchain relationid="rel-7">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0051.3" variable="p"/>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0051.4" parentconcept="avoid-01" parentvariable="a"/>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0051.5" parentconcept="get-04" parentvariable="g"/>
         </identchain>
         <identchain relationid="rel-8">
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0049.3" variable="t2"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0053.1" variable="t2"/>
         </identchain>
         <identchain relationid="rel-9">
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0051.1" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0051.7" variable="i"/            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0051.2" variable="i"/>
         </identchain>
      </identity>

Finally, we can encode set/member and part/whole relations, and any AMR variables they refer to that aren't in the coreference chains:

      <singletons>
         <identchain relationid="singleton-10">
            <mention concept="idiot" id="bolt-eng-DF-200-192448-6191965_0051.3" variable="i"/>
         </identchain>
         <identchain relationid="singleton-12">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="p">Charles_Prince_of_Wales</mention>
         </identchain>
      </singletons>
      <bridging>
         <setmember relationid="rel-11">
            <superset id="rel-1"/>
            <member id="rel-7"/>
            <member id="singleton-10"/>
         </setmember>
         <setmember relationid="rel-13">
            <superset id="rel-8"/>
            <member id="singleton-12"/>
            <member id="rel-6"/>
         </setmember>
      </bridging>
   </relations>

Any suggestions? We want this to feel as obvious and as easy to understand as possible. Some questions:

  • We have the post ID, but don't add additional structure describing more detais of threaded discussions -- such as when one AMR is re-quoted. Do we need it?
  • Three very specific wikification errors are still problematic (identity chains having multiple wiki links, where one is definitely wrong):
    • ldcpreferred bolt-eng-DF-200-192451-5796283_0090.8 (snt. 103 in workset dfb-0248): "Promised_Land" should probably be "-" (it's "Pakistan" in context)
    • cjconsensus DF-200-192448-618_9851.11 (snt. 11 in workset dfa-wset-56): "Ozzy_Osbourne" should actually be "Sharon_Osbourne" in context.
    • cjconsensus wb.eng_0009.23 (snt. 23 in workset wb-eng-0009): f / family :wiki "Michael_Jackson" should be f / family :wiki "Jackson_family"

Current status:
set documents amrs
WB 16 812
DF(LDC) 62 2689
DFB(UCO) 49 2056
DFA(UCO) 139 2163
total 266 7720