Code_detection

Step 1: run generate_openai.py to generate codes based on questions from humaneval dataset (https://github.com/openai/human-eval) based on one specific model version, like chatgpt-3, 3.5, 4. Saved in ./results

Step 2: for baseline result on DNA-GPT, run regenerate_gpt4.py to do regeneration for detection if the previous step generates codes on gpt-4. Then run load_data_gpt4.ipynb for parsing. Saved in ./results.

Step 3: for DetectGPT4Code result, run fill_in_the_middle.py for FIM task. You can specify dataset, FIM model version or mask_lines. Saved in ./results/. The number of FIM perturbation depends on your maximum GPU memory, so you might need to merge the results by runing fill_in_the_middle.py multiple times. For example, if fill_in_the_middle.py can only generate 4 perturbation per run, then you have to run it 10 times and combine their results together to get 40 perturbations. (currently only support one-gpu.)

Step 4: run detect_detectgpt4code.ipynb for detection. Also, the commericial baselines are detect_gptzero.py, detect_openai.py. And my_detector_gpt35or4.ipynb, my_detector_whitebox.ipynb serve as baselines for DNA-GPT.

About

Codes for paper: Zero-Shot Detection of Machine-Generated Codes

https://arxiv.org/abs/2310.05103

Creative Commons Zero v1.0 Universal

Languages

Language:Jupyter Notebook 53.3%Language:Python 46.7%