ppke-nlpg / vframe

A method for constraining possible verbal frames based on the preverb and the infinitival argument for Hungarian verbs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VFrame

VFrame is a method for constraining possible verbal frames based on the verbal particle and the infinitival argument of Hungarian verbs. A detailed description of the VFrame algorithm can be found in English in Indig & Vadász 2016 (see bibliographical data at the end of the this README).

Structure

  • the main folder contains 3 scripts, these must be executed in the following order:
    • preprocess_input_for_magyarlanc.py: it preprocesses the test file to the magyarlanc dependency parser (after that, the magyarlanc system must be used with the temp/to_magyarlanc input, resulting in the temp/to_magyarlanc_out)
    • annotate.py: it runs VFrame searcher
    • eval.py: it evaluates the results of VFrame searcher
  • the temp folder stores temporal data produced by the 3 scripts
  • the manocska folder contains data coming from the Manócska database
  • the test_data folder contains the test set (final_test.txt) and the gold standard (only_manually_annotated.txt)

Evaluation

Our test set (test_data/final_test.txt) contains 1000 clauses extracted from the Hungarian Gigaword Corpus 2.0.4. The clauses are selected according to the following criteria:

  • the clause has to contain exactly one finite verb,
  • in addition to this, it must have at least one verbal particle OR an infinitive,
  • the finite as well as the infinite verb can be a particle verb (thus, they particle is written together with the verb).

'Verbal particle (PRT) - finite verb (FIN) - infinite verb (INF)' combinations with their frequencies and original examples in Hungarian:

clauses containing a detached verbal particle:

  • detached PRT & no INF: 573
    • e.g. de az élmény csak ideig-óráig villanyozta(FIN) fel(PRT).
  • detached PRT & INF without PRT: 65
    • e.g. a köztes állásokat ki(PRT) kell(FIN) következtetni(INF) a szomszédosakból,
  • detached PRT & INF with PRT: 3
    • e.g. Egycsatáros játékkal próbált(FIN) meg(PRT) sikert elérni(PRT+INF) a Celtic Milánóban,

clauses NOT containing any detached verbal particles:

  • FIN without PRT & INF without PRT: 225
    • e.g. az alkotmányt is módosítani(INF) kellene(FIN),
  • FIN without PRT & INF with PRT: 120
    • e.g. Nem tudtam(FIN) megítélni(PRT+INF),
  • FIN with PRT & INF without PRT: 10
    • e.g. hogy elkezdtünk(PRT+FIN) gondolkozni(INF)
  • FIN with PRT & INF with PRT: 4
    • e.g. és az egyikbe tényleg megpróbál(PRT+FIN) több utast beültetni(PRT+INF) a droszton sürgölődő hosztesz.

References used in the README:

  • Indig, Balázs – Vadász, Noémi (2016): Windows in Human Parsing -- How Far can a Preverb Go? Tenth International Conference on Natural Language Processing (HrTAL2016), Dubrovnik, Croatia, September 29--30, 2016.
  • Hungarian Gigaword Corpus (Magyar Nemzeti Szövegtár 2)
  • Oravecz, Csaba – Váradi, Tamás – Sass, Bálint (2014): The Hungarian Gigaword Corpus. In: Proceedings of LREC 2014. Reykjavík. 1719–1723.
  • Zsibrita, János – Vincze, Veronika – Farkas, Richárd (2013): magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of RANLP 2013, pp. 763–771.

Licence

It can be used for education, research and private projects. In case you use VFrame, please cite one of the following articles:

Vadász Noémi, Kalivoda Ágnes, Indig Balázs. Egy egységesített magyar igei vonzatkerettár építése és felhasználása. XIV. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2018). 3--15. Szeged. 2018.

@inproceedings{vadasz_kalivoda_indig_2018a,
    title = {Egy egys{\'e}ges{\'i}tett magyar igei vonzatkerett{\'a}r {\'e}p{\'i}t{\'e}se {\'e}s felhaszn{\'a}l{\'a}sa},
    booktitle = {XIV. Magyar Sz{\'a}m{\'i}t{\'o}g{\'e}pes Nyelv{\'e}szeti Konferencia (MSZNY 2018)},
    year = {2018},
    pages = {3{\textendash}15},
    publisher={Szegedi Tudom{\'a}nyegyetem Informatikai Tansz{\'e}kcsoport},
    organization = {Szegedi Tudom{\'a}nyegyetem Informatikai Int{\'e}zet},
    address = {Szeged},
    author = {Vad{\'a}sz, No{\'e}mi and Kalivoda, {\'A}gnes and Indig, Bal{\'a}zs},
    editor = {Vincze, Veronika}
}

Indig, Balázs and Vadász, Noémi. Windows in Human Parsing -- How Far can a Preverb Go? Tenth International Conference on Natural Language Processing (HrTAL2016), Dubrovnik, Croatia, September 29--30, 2016.

@conference {indig_vadasz_2016b,
    title = {Windows in Human Parsing {\textendash} How Far can a Preverb Go?},
    booktitle = {Tenth International Conference on Natural Language Processing (HrTAL2016) 2016, Dubrovnik, Croatia, September 29-30, 2016, Proceedings},
    year = {2016},
    note = {to appear},
    publisher = {Springer},
    organization = {Springer},
    address = {Cham},
    keywords = {indba, vadno},
    author = {Indig, Bal{\'a}zs and Vad{\'a}sz, No{\'e}mi},
    editor = {Tadi{\'c}, Marko and Bekavac, Bo{\.z}o}
}

About

A method for constraining possible verbal frames based on the preverb and the infinitival argument for Hungarian verbs

License:GNU Lesser General Public License v3.0


Languages

Language:Python 100.0%