This repository contains all detailed information and resources for our tutorial at ECIR 2023, held at Dublin, Ireland (April 2023).
Artificial Intelligence (AI), Machine Learning (ML), Information Retrieval (IR) and Natural Language Processing (NLP) are transforming the way legal professionals and law firms approach their work. The significant potential for the application of AI to Law, for instance, by creating computational solutions for legal tasks, has intrigued researchers for decades. This appeal has only been amplified with the advent of Deep Learning (DL). It is worth noting that working with legal text is far more challenging than in many other subdomains of IR/NLP, mainly due to factors like lengthy documents, complex language and lack of large-scale datasets. In this tutorial, we shall introduce the audience to the nature of legal systems and texts, and the challenges associated with processing legal documents. We shall then touch upon the history of AI and Law research, and how it has evolved over the years from rudimentary approaches to DL techniques. There will also be a brief introduction into the recent, state-of-the-art research in general domain IR and NLP. We shall then discuss in more detail about specific IR/NLP tasks in the legal domain and their solutions, available tools and datasets, as well as the industry perspective. This will be followed by a hands-on coding/demo session, which is likely to be of great practical benefit to the attendees.
Part | Topic | Presenter | Link to Slides |
---|---|---|---|
1 | Background on legal text | Saptarshi Ghosh | Slides |
2 | Brief history of AI-Law and important milestones | Jack G. Conrad | Slides |
3 | Background on NLP and IR | Pawan Goyal | Slides |
4 | State-of-the-art survey | Debasis Ganguly, Paheli Bhattacharya and Kripabandhu Ghosh | Slides |
5 | Industry perspective | Jack G. Conrad | Slides |
6 | Future directions, advent of LLMs and explainability | Jack G. Conrad, Kripabandhu Ghosh and Saptarshi Ghosh | Slides |
7 | Hands-on coding | Debasis Ganguly, Paheli Bhattacharya, Shounak Paul and Shubham Kumar Nigam | JuPyter Notebook |
This section contains resources for different automation tasks in the legal domain
This task aims to identify different entities in legal documents. Entities may be classified into different groups that have different legal meanings, such as the parties (appellants, respondents), lawyers, judges and so on.
- A Dataset of German Legal Documents for Named Entity Recognition (Lietner et al., 2020)
- Named Entity Recognition in Indian court judgments (Kalamkar et al., 2022)
The task of summarization in the legal domain aims to generate a gist of the entire case document, either in extractive fashion (selecting the most important sentences) or abstractive fashion (similar to summaries written by humans).
- Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation (Bhattacharya et al., 2022)
- BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization (Sharma et al., 2019)
Broadly speaking, this task aims to determine the outcomes of court cases. In many settings, this may be composed of several sub-tasks, which are addressed in the forthcoming sections.
- Natural language processing in law: Prediction of outcomes in the higher courts of Turkey (Mumcuoglu et al., 2021)
- Building corpora for the philological study of Swiss legal texts (Hofler et al., 2011)
- ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation (Malik et al., 2021)
- Judicial Decisions of the European Court of Human Rights: Looking into the Crystal Ball (Medvedeva et al., 2018)
- CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction (Xiao et al., 2018)
Often considered a sub-task of Legal Judgment Prediction, this task aims to identify the relevant legal articles and charges given the facts of a case.
- LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents (Paul et al., 2022)
- Hierarchical Matching Network for Crime Classification (Wang et al., 2019)
- Automatic Charge Identification from Facts: A Few Sentence-Level Charge Annotations is All You Need (Paul et al., 2020)
- Charge Prediction with Legal Attention (Bao et al., 2019)
Court case documents are composed of several functional parts such as Facts, Arguments, Ruling, etc. which may not be clearly demarcated. This task aims to automate the process of segmenting a court case document into these parts.
- Identification of Rhetorical Roles of Sentences in Indian Legal Judgments (Bhattacharya et al., 2019)
- The French Court Decision Structure dataset — FCD12K
Recently there have been many efforts to pre-train large, transformer-based language models for the legal domain, which have been adapted to many down-stream end tasks with spectacular efficiency.
- LEGAL-BERT: The Muppets straight out of Law School (Chalkidis et al., 2020)
- When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings (Zheng et al., 2021)
- Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset (Henderson et al., 2022)
- Pre-training Transformers on Indian Legal Text (Paul et al., 2022)
This is a miscellaneous list of other resources.
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English (Chalkidis et al., 2022)
- Liquid Legal Institute Repository on Legal Text Analytics
- Debasis Ganguly, Lecturer (Assistant Professor), School of Computing Science, University of Glasgow, Glasgow, Scotland
- Jack G. Conrad, Director of Applied Research, Thomson Reuters Labs, Minneapolis, MN USA
- Kripabandhu Ghosh, Assistant Professor, Department of Computational & Data Sciences, IISER Kolkata, West Bengal, India
- Saptarshi Ghosh, Assistant Professor, Department of Computer Science & Engineering, IIT Kharagpur, West Bengal, India
- Pawan Goyal, Associate Professor, Deptt. of Computer Science & Engineering, IIT Kharagpur, West Bengal, India
- Paheli Bhattacharya, NLP Research Architect, Bosch Research, India
- Shubham Kumar Nigam, Senior Research Fellow, Department of Computer Science & Engineering, IIT Kanpur, Uttar Pradesh, India
- Shounak Paul, Senior Research Fellow, Department of Computer Science & Engineering, IIT Kharagpur, West Bengal, India