zxlzr / Reasoning_In_EE

Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OntoED and OntoEvent

OntoED: A Model for Low-resource Event Detection with Ontology Embedding

🍎 The project is an official implementation for OntoED model and a repository for OntoEvent dataset, which has firstly been proposed in the paper OntoED: Low-resource Event Detection with Ontology Embedding accepted by ACL 2021.

πŸ€— The implementations are based on Huggingface's Transformers and remanagement is referred to MAVEN's baselines & DeepKE.

πŸ€— We also provide some baseline implementations for reproduction.

Brief Introduction

OntoED is a model that resolves event detection under low-resource conditions. It models the relationship between event types through ontology embedding: it can transfer knowledge of high-resource event types to low-resource ones, and the unseen event type can establish connection with seen ones via event ontology.

Project Structure

The structure of data and code is as follows:

Reasoning_In_EE
β”œβ”€β”€ README.md
β”œβ”€β”€ OntoED			# model
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ data_utils.py		# for data processing
β”‚   β”œβ”€β”€ ontoed.py			# main model
β”‚   β”œβ”€β”€ run_ontoed.py		# for model running
β”‚   └── run_ontoed.sh		# bash file for model running
β”œβ”€β”€ OntoEvent		# data
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ event_dict_data_on_doc.json.zip		# raw full ED data
β”‚   β”œβ”€β”€ event_dict_train_data.json			# ED data for training
β”‚   β”œβ”€β”€ event_dict_test_data.json			# ED data for testing
β”‚   β”œβ”€β”€ event_dict_valid_data.json			# ED data for validation
β”‚   └── event_relation.json					# event-event relation data
└── baselines		# baseline models
    β”œβ”€β”€ DMCNN
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ convert.py			# for data processing
    β”‚   β”œβ”€β”€ data				# data
    β”‚   β”‚   └── labels.json
    β”‚   β”œβ”€β”€ dmcnn.config		# configure training & testing
    β”‚   β”œβ”€β”€ eval.sh				# bash file for model evaluation
    β”‚   β”œβ”€β”€ formatter
    β”‚   β”‚   β”œβ”€β”€ DmcnnFormatter.py	# runtime data processing
    β”‚   β”‚   └── __init__.py
    β”‚   β”œβ”€β”€ main.py				# project entrance
    β”‚   β”œβ”€β”€ model
    β”‚   β”‚   β”œβ”€β”€ Dmcnn.py		# main model
    β”‚   β”‚   └── __init__.py
    β”‚   β”œβ”€β”€ raw
    β”‚   β”‚   └── 100.utf8		# word vector
    β”‚   β”œβ”€β”€ reader
    β”‚   β”‚   β”œβ”€β”€ MavenReader.py	# runtime data reader
    β”‚   β”‚   └── __init__.py
    β”‚   β”œβ”€β”€ requirements.txt	# requirements
    β”‚   β”œβ”€β”€ train.sh			# bash file for model training
    β”‚   └── utils
    β”‚       β”œβ”€β”€ __init__.py
    β”‚       β”œβ”€β”€ configparser_hook.py
    β”‚       β”œβ”€β”€ evaluation.py
    β”‚       β”œβ”€β”€ global_variables.py
    β”‚       β”œβ”€β”€ initializer.py
    β”‚       └── runner.py
    β”œβ”€β”€ JMEE
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ data				# to store data file
    β”‚   β”œβ”€β”€ enet
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ consts.py		# configurable parameters
    β”‚   β”‚   β”œβ”€β”€ corpus
    β”‚   β”‚   β”‚   β”œβ”€β”€ Corpus.py	# dataset class
    β”‚   β”‚   β”‚   β”œβ”€β”€ Data.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ Sentence.py
    β”‚   β”‚   β”‚   └── __init__.py
    β”‚   β”‚   β”œβ”€β”€ models			# modules of JMEE
    β”‚   β”‚   β”‚   β”œβ”€β”€ DynamicLSTM.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ EmbeddingLayer.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ GCN.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ HighWay.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ SelfAttention.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ ee.py
    β”‚   β”‚   β”‚   └── model.py	# main model
    β”‚   β”‚   β”œβ”€β”€ run
    β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”‚   └── ee
    β”‚   β”‚   β”‚       β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”‚       └── runner.py	# runner class
    β”‚   β”‚   β”œβ”€β”€ testing.py		# evaluation
    β”‚   β”‚   β”œβ”€β”€ training.py		# training
    β”‚   β”‚   └── util.py
    β”‚   β”œβ”€β”€ eval.sh				# bash file for model evaluation
    β”‚   β”œβ”€β”€ requirements.txt	# requirements
    β”‚   └── train.sh			# bash file for model training
    β”œβ”€β”€ README.md
    β”œβ”€β”€ eq1.png
    β”œβ”€β”€ eq2.png
    β”œβ”€β”€ jointEE-NN
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ data
    β”‚   β”‚   └── fistDoc.nnData4.txt	# data format sample
    β”‚   β”œβ”€β”€ evaluateJEE.py			# model evaluation
    β”‚   β”œβ”€β”€ jeeModels.py			# main model
    β”‚   β”œβ”€β”€ jee_processData.py		# data process
    β”‚   └── jointEE.py				# project entrance
    └── stanford.zip			# cleaned dataset for baseline models

Requirements

  • python==3.6.9

  • torch==1.8.0 (lower may also be OK)

  • transformers==2.8.0

  • sklearn==0.20.2

Usage

1. Project Preparation:Download this project and unzip the dataset. You can directly download the archive, or run git clone https://github.com/231sm/Reasoning_In_EE.git at your teminal.

cd [LOCAL_PROJECT_PATH]

git clone https://github.com/231sm/Reasoning_In_EE.git

2. Running Preparation: Adjust the parameters in run_ontoed.sh bash file, and input the true path of 'LABEL_PATH' and 'RELATION_PATH' at the end of data_utils.py.

cd Reasoning_In_EE/OntoED

vim run_ontoed.sh
(input the parameters, save and quit)

vim data_utils.py
(input the path of 'LABEL_PATH' and 'RELATION_PATH', save and quit)

Hint:

3. Running Model: Run ./run_ontoed.sh for training, validation, and testing. A folder with configuration, models weights, and results (in is_test_true_eval_results.txt) will be saved at the path you input ('--output_dir') in the bash file run_ontoed.sh.

cd Reasoning_In_EE/OntoED

./run_ontoed.sh
('--do_train', '--do_eval', '--evaluate_during_training', '--test' is necessarily input in 'run_ontoed.sh')

Or you can run run_ontoed.py with manual parameter input (parameters can be copied from 'run_ontoed.sh')

python run_ontoed.py --para... 

How about the Dataset

OntoEvent is proposed for ED and also annotated with correlations among events. It contains 13 supertypes with 100 subtypes, derived from 4,115 documents with 60,546 event instances. Please refer to OntoEvent for details.

Statistics

The statistics of OntoEvent are shown below, and the detailed data schema can be referred to our paper.

Dataset #Doc #Instance #SuperType #SubType #EventCorrelation
ACE 2005 599 4,090 8 33 None
TAC KBP 2017 167 4,839 8 18 None
FewEvent - 70,852 19 100 None
MAVEN 4,480 111,611 21 168 None
OntoEvent 4,115 60,546 13 100 3,804

Data Format

The OntoEvent dataset is stored in json format.

πŸ’For each event instance in event_dict_data_on_doc.json, the data format is as below:

{
    'doc_id': '...', 
    'doc_title': 'XXX', 
    'sent_id': , 
    'event_mention': '......', 
    'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
    'trigger': '...', 
    'trigger_pos': [, ], 
    'event_type': ''
}

πŸ’For each event relation in event_relation.json, we list the event instance pair, and the data format is as below:

'EVENT_RELATION_1': [ 
    [
        {
            'doc_id': '...', 
            'doc_title': 'XXX', 
            'sent_id': , 
            'event_mention': '......', 
            'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
            'trigger': '...', 
            'trigger_pos': [, ], 
            'event_type': ''
        }, 
        {
            'doc_id': '...', 
            'doc_title': 'XXX', 
            'sent_id': , 
            'event_mention': '......', 
            'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
            'trigger': '...', 
            'trigger_pos': [, ], 
            'event_type': ''
        }
    ], 
    ...
]

πŸ’Especially for "COSUPER", "SUBSUPER" and "SUPERSUB", we list the event type pair, and the data format is as below:

"COSUPER": [
    ["Conflict.Attack", "Conflict.Protest"], 
    ["Conflict.Attack", "Conflict.Sending"], 
    ...
]

How to Cite

πŸ“‹ Thank you very much for your interest in our work. If you use or extend our work, please cite the following paper:

@inproceedings{ACL2021_OntoED,
    title = "{O}nto{ED}: Low-resource Event Detection with Ontology Embedding",
    author = "Deng, Shumin  and
      Zhang, Ningyu  and
      Li, Luoqiu  and
      Hui, Chen  and
      Huaixiao, Tou  and
      Chen, Mosha  and
      Huang, Fei  and
      Chen, Huajun",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.220",
    doi = "10.18653/v1/2021.acl-long.220",
    pages = "2828--2839"
}

About

Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"


Languages

Language:Python 99.2%Language:Shell 0.8%