- An enhanced version of CodeAttack, highly effective against the latest
GPT-4
andClaude-3
series models, will be available next week!
For more details, please refer to our paper ACL 2024.
✨An example run:
python3 main.py --num_sample 3 \
--prompt-type code_python \
--target-model=gpt-4-1106-preview \
--judge-model=gpt-4-1106-preview \
--exp_name=python_stack_full \
--data_key=code_wrapped_plain_attack \
--target-max-n-tokens=1000 \
--judge \
--multi-thread \
--temperature 0 \
--start_idx 0 --end_idx -1
- The 'data' folder contains the CodeAttack datasets curated using AdvBench. There are three versions of CodeAttack, each utilizing different input encoding ways:
data_python_string_full.json
,data_python_list_full.json
, anddata_python_stack_full.json
. - We provide templates for our CodeAttack in both C++ and Go, named
code_C_string.txt
andcode_go_string.txt
, respectively. - Given the limited capability of Llama-2-7B-chat to follow instructions within the code domain, we simplify our CodeAttack in the form of
code_python_list_llama.txt
for evaluation purposes. We also present manual evaluation results of Llama-2-7B-chat in our paper, instead of using GPT-4 as the evaluator.
Overview of our CodeAttack. CodeAttack constructs a code template with three steps: (1) Input encoding which encodes the harmful text-based query with common data structures; (2) Task understanding which applies a decode() function to allow LLMs to extract the target task from various kinds of inputs; (3) Output specification which enables LLMs to fill the output structure with the user’s desired content.
If you find our paper&tool interesting and useful, please feel free to give us a star and cite us through:
@inproceedings{
Ren2024codeattack,
title={Exploring Safety Generalization Challenges of Large Language Models via Code},
author={Qibing Ren and Chang Gao and Jing Shao and Junchi Yan and Xin Tan and Wai Lam and Lizhuang Ma},
booktitle={The 62nd Annual Meeting of the Association for Computational Linguistics},
year={2024},
url={https://arxiv.org/abs/2403.07865}
}