httdty / PURPLE_ICDE

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🟣 PURPLE

Code for the paper PURPLE: Making a Large Language Model a Better SQL Writer.

Overview: PURPLE is a cutting-edge solution designed to enhance the capabilities of large language models in generating SQL queries efficiently and accurately.

Dataset Download

Unzip the data and organize into the following format:

spider
β”œβ”€β”€ database
β”œβ”€β”€ dev.json
β”œβ”€β”€ train_spider_pruned.json
└── tables.json

Environment Build

We publish our docker image for easier experiments reproduction, you can achieve such a image by:

docker pull thren20/purple:v3
docker run -itd --rm --name YOUR_CONTAINER_NAME --mount type=bind,source=PATH_TO_YOUR_CODE,target=/workspace/ thren20/purple:v3

NOTE: The trained models are also included in the docker image.

Of course, you can build such an environment without docker, the packages are included in the requirements.txt. We offer an environment building script as env.sh for you:

chmod 744 env.sh
bash env.sh

Pipeline

To reproduce the experiments in the paper, we prepare a script for that.

chmod 744 script/infer_pipeline.sh
bash script/infer_pipeline.sh

About


Languages

Language:Python 97.7%Language:Jupyter Notebook 1.7%Language:Shell 0.6%