LanLi2017 / data_cleaning_exp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data_cleaning_exp

History Update Problem

Experiment I: OpenRefine Data Cleaning Process Generate Six versions of Data Cleaning Processes

Experiment II:

  • LLM-based workflow analysis
  • Chain-of-Table Prompts

LLM-based Data Cleaning

Pipeline:_

  1. Run llm_dc.py
  • data input
  • prompt types: [1].zero_shot: data cleaning objectives, requirements [2].example_based: data cleaning objectives, example repairs, requirements [3].example_sample: data cleaning objectives, example repairs, sample rows, requirements [4].profile_example_sample: data cleaning objectives, example repairs, sample rows, profiling results, requirements
  • For each type of prompt, log LLM's responses: -- zero_shot -- example_based -- example_sample -- profile_example_sample
  1. Check the python scripts from LLM's responses Question: How does the quality of response reflect the quality of the prompt?

About


Languages

Language:Python 93.8%Language:Jupyter Notebook 5.2%Language:Shell 0.9%