OntoED and OntoEvent
OntoED: A Model for Low-resource Event Detection with Ontology Embedding
π The project is an official implementation for OntoED model and a repository for OntoEvent dataset, which has firstly been proposed in the paper OntoED: Low-resource Event Detection with Ontology Embedding accepted by ACL 2021.
π€ The implementations are based on Huggingface's Transformers and remanagement is referred to MAVEN's baselines & DeepKE.
π€ We also provide some baseline implementations for reproduction.
Brief Introduction
OntoED is a model that resolves event detection under low-resource conditions. It models the relationship between event types through ontology embedding: it can transfer knowledge of high-resource event types to low-resource ones, and the unseen event type can establish connection with seen ones via event ontology.
Project Structure
The structure of data and code is as follows:
Reasoning_In_EE
βββ README.md
βββ OntoED # model
β βββ README.md
β βββ data_utils.py # for data processing
β βββ ontoed.py # main model
β βββ run_ontoed.py # for model running
β βββ run_ontoed.sh # bash file for model running
βββ OntoEvent # data
β βββ README.md
β βββ __init__.py
β βββ event_dict_data_on_doc.json.zip # raw full ED data
β βββ event_dict_train_data.json # ED data for training
β βββ event_dict_test_data.json # ED data for testing
β βββ event_dict_valid_data.json # ED data for validation
β βββ event_relation.json # event-event relation data
βββ baselines # baseline models
βββ DMCNN
β βββ README.md
β βββ convert.py # for data processing
β βββ data # data
β β βββ labels.json
β βββ dmcnn.config # configure training & testing
β βββ eval.sh # bash file for model evaluation
β βββ formatter
β β βββ DmcnnFormatter.py # runtime data processing
β β βββ __init__.py
β βββ main.py # project entrance
β βββ model
β β βββ Dmcnn.py # main model
β β βββ __init__.py
β βββ raw
β β βββ 100.utf8 # word vector
β βββ reader
β β βββ MavenReader.py # runtime data reader
β β βββ __init__.py
β βββ requirements.txt # requirements
β βββ train.sh # bash file for model training
β βββ utils
β βββ __init__.py
β βββ configparser_hook.py
β βββ evaluation.py
β βββ global_variables.py
β βββ initializer.py
β βββ runner.py
βββ JMEE
β βββ README.md
β βββ data # to store data file
β βββ enet
β β βββ __init__.py
β β βββ consts.py # configurable parameters
β β βββ corpus
β β β βββ Corpus.py # dataset class
β β β βββ Data.py
β β β βββ Sentence.py
β β β βββ __init__.py
β β βββ models # modules of JMEE
β β β βββ DynamicLSTM.py
β β β βββ EmbeddingLayer.py
β β β βββ GCN.py
β β β βββ HighWay.py
β β β βββ SelfAttention.py
β β β βββ __init__.py
β β β βββ ee.py
β β β βββ model.py # main model
β β βββ run
β β β βββ __init__.py
β β β βββ ee
β β β βββ __init__.py
β β β βββ runner.py # runner class
β β βββ testing.py # evaluation
β β βββ training.py # training
β β βββ util.py
β βββ eval.sh # bash file for model evaluation
β βββ requirements.txt # requirements
β βββ train.sh # bash file for model training
βββ README.md
βββ eq1.png
βββ eq2.png
βββ jointEE-NN
β βββ README.md
β βββ data
β β βββ fistDoc.nnData4.txt # data format sample
β βββ evaluateJEE.py # model evaluation
β βββ jeeModels.py # main model
β βββ jee_processData.py # data process
β βββ jointEE.py # project entrance
βββ stanford.zip # cleaned dataset for baseline models
Requirements
-
python==3.6.9
-
torch==1.8.0 (lower may also be OK)
-
transformers==2.8.0
-
sklearn==0.20.2
Usage
1. Project PreparationοΌDownload this project and unzip the dataset. You can directly download the archive, or run git clone https://github.com/231sm/Reasoning_In_EE.git
at your teminal.
cd [LOCAL_PROJECT_PATH]
git clone https://github.com/231sm/Reasoning_In_EE.git
2. Running Preparation: Adjust the parameters in run_ontoed.sh
bash file, and input the true path of 'LABEL_PATH' and 'RELATION_PATH' at the end of data_utils.py
.
cd Reasoning_In_EE/OntoED
vim run_ontoed.sh
(input the parameters, save and quit)
vim data_utils.py
(input the path of 'LABEL_PATH' and 'RELATION_PATH', save and quit)
Hint:
- Please refer to
main()
function inrun_ontoed.py
file for detail meanings of each parameters. - 'LABEL_PATH' and 'RELATION_PATH' means the path for event_dict_train_data.json and event_relation.json respectively.
3. Running Model: Run ./run_ontoed.sh
for training, validation, and testing.
A folder with configuration, models weights, and results (in is_test_true_eval_results.txt
) will be saved at the path you input ('--output_dir') in the bash file run_ontoed.sh
.
cd Reasoning_In_EE/OntoED
./run_ontoed.sh
('--do_train', '--do_eval', '--evaluate_during_training', '--test' is necessarily input in 'run_ontoed.sh')
Or you can run run_ontoed.py with manual parameter input (parameters can be copied from 'run_ontoed.sh')
python run_ontoed.py --para...
How about the Dataset
OntoEvent is proposed for ED and also annotated with correlations among events. It contains 13 supertypes with 100 subtypes, derived from 4,115 documents with 60,546 event instances. Please refer to OntoEvent for details.
Statistics
The statistics of OntoEvent are shown below, and the detailed data schema can be referred to our paper.
Dataset | #Doc | #Instance | #SuperType | #SubType | #EventCorrelation |
---|---|---|---|---|---|
ACE 2005 | 599 | 4,090 | 8 | 33 | None |
TAC KBP 2017 | 167 | 4,839 | 8 | 18 | None |
FewEvent | - | 70,852 | 19 | 100 | None |
MAVEN | 4,480 | 111,611 | 21 | 168 | None |
OntoEvent | 4,115 | 60,546 | 13 | 100 | 3,804 |
Data Format
The OntoEvent dataset is stored in json format.
πFor each event instance in event_dict_data_on_doc.json
, the data format is as below:
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
}
πFor each event relation in event_relation.json
, we list the event instance pair, and the data format is as below:
'EVENT_RELATION_1': [
[
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
},
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
}
],
...
]
πEspecially for "COSUPER", "SUBSUPER" and "SUPERSUB", we list the event type pair, and the data format is as below:
"COSUPER": [
["Conflict.Attack", "Conflict.Protest"],
["Conflict.Attack", "Conflict.Sending"],
...
]
How to Cite
π Thank you very much for your interest in our work. If you use or extend our work, please cite the following paper:
@inproceedings{ACL2021_OntoED,
title = "{O}nto{ED}: Low-resource Event Detection with Ontology Embedding",
author = "Deng, Shumin and
Zhang, Ningyu and
Li, Luoqiu and
Hui, Chen and
Huaixiao, Tou and
Chen, Mosha and
Huang, Fei and
Chen, Huajun",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.220",
doi = "10.18653/v1/2021.acl-long.220",
pages = "2828--2839"
}