Experiment I: OpenRefine Data Cleaning Process Generate Six versions of Data Cleaning Processes
Experiment II:
- LLM-based workflow analysis
- Chain-of-Table Prompts
Pipeline:_
- Run llm_dc.py
- data input
- prompt types: [1].zero_shot: data cleaning objectives, requirements [2].example_based: data cleaning objectives, example repairs, requirements [3].example_sample: data cleaning objectives, example repairs, sample rows, requirements [4].profile_example_sample: data cleaning objectives, example repairs, sample rows, profiling results, requirements
- For each type of prompt, log LLM's responses: -- zero_shot -- example_based -- example_sample -- profile_example_sample
- Check the python scripts from LLM's responses Question: How does the quality of response reflect the quality of the prompt?