bodhibudd / LEVEN

Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LEVEN

Dataset and source code for ACL 2022 Findings paper "LEVEN: A Large-Scale Chinese Legal Event Detection Dataset" .

Background

Events are the essence of the facts in legal cases. Therefore, Legal Event Detection (LED) is fundamentally important and naturally beneficial to case understanding and other Legal AI tasks.

bg

Overview

The dataset can be obtained from Tsinghua Cloud or Google Drive. The annotation guidelines are provided in Annotation Guidelines. You can also check out our poster at ACL2022 main conference.

We remove the annotations for the test set deliberately. To get the results on LEVEN test set, please refer to Leaderboard.

Large Scale

LEVEN is the largest Legal Event Detection dataset and the largest Chinese Event Detection dataset. Here is a comparison between the scale of LEVEN and other datasets.

tab1

Datasets denoted with * are not publicly available, and – means the value is not accessible

High Coverage

LEVEN contains 108 event types in total, including 64 charge-oriented events and 44 general events. Their distribution is shown below.

tab2

The LEVEN event schema has a sophisticated hierarchical structure, which is shown here.

Leaderboard

LEVEN is adopted for CAIL 2022, the most influential Legal AI contest in China.

You can submit your predictions to CAIL Event Detection Track to win a prize up to CNY 15,000!

Please follow submission instructions here.

Experiments

The source codes for the experiments are included in the Baselines and Downstreams folder.

Baselines

We implement six competitive Baselines and their performances are as follows.

tab3

Downstream Tasks

We also explore the use of LEVEN on two Downstreams. We simply use event as side information to promote the performance of Legal Judgment Prediction and Similar Case Retrieval.

The experiment results for Legal Judgment Prediction are shown below.

tab4

The experiment results for Similar Case Retrieval are shown below.

tab5

Schema

The Chinese event schema is shown below. Please check our paper for the English version.

The detailed explanation and annotation guidelines are provided in Annotation Guidelines.

schema

Citation

If these data and codes help you, please cite this paper.

@inproceedings{yao-etal-2022-leven,
    title = "{LEVEN}: A Large-Scale {C}hinese Legal Event Detection Dataset",
    author = "Yao, Feng and Xiao, Chaojun and Wang, Xiaozhi and Liu, Zhiyuan and Hou, Lei and Tu, Cunchao and Li, Juanzi and Liu, Yun and Shen, Weixing and Sun, Maosong",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    year = "2022",
    url = "https://aclanthology.org/2022.findings-acl.17",
    doi = "10.18653/v1/2022.findings-acl.17",
    pages = "183--201",
}

About

Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"


Languages

Language:Python 99.4%Language:Shell 0.6%