JulianNeuberger / llm-process-generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extracting Process Information from Natural Language Text Using Large Language Models

This is the code for our ER conference 2024 submission "A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models", as well as material that did not make it into our paper.

Also check the companion repository for generating BPMN models from PET annotated documents: https://anonymous.4open.science/r/pet-to-bpmn-poc-B465/README.md

The excel file results.xlsx contains all results and the raw answer files that contains those results.

Ablation study results

Performance for the MD task on PET by number of shots.


Ablation study results Ablation study results

Left: Most relevant results of our ablation study. Right: Differences in performance for 10 repeated runs.


Ablation study results Ablation study results

Both: Performance when rephrasing prompts, change in wording measured as the cosine similarity of the prompt BOW representation. Minor changes in similarity elicit only minor changes in performance.

Installation

Requirements

Setup

Create a new conda environment with all the required dependencies and activate it by running

mamba env create -f env.yaml
conda activate llm-process-generation

Download the required datasets from

Running experiments

You can find top level files for all experiments we ran:

  • pet_md.py: Mention detection on pet data
  • pet_er.py: Entity resolution on pet data
  • pet_re.py: Relation extraction on pet data
  • vanderaa_md.py: Mention detection on van der Aa data
  • vanderaa_re.py: Constraint extraction on van der Aa data
  • quishpi_md.py: Mention detection on Quishpi data
  • quishpi_re.py: Constraint extraction on Quishpi data
  • analysis.py contains the code for the ablation study

Note that running these files requires your OpenAI api key to be set as an environment variable!

Prompts and Answers

You can find all the prompts that we used during our experiments (even archived and early ones) in res/prompts/.

Answers for the reported runs can be found in res/answers. You can parse these files using experiments/parse.py. Set the path to the answer file near line 347: answer_file = f"res/answers/analysis/re/no_disambiguation.json" and run the file using python -m experiments.parse from the top level project directory.

About


Languages

Language:Python 100.0%