AnthenaMatrix / The-I-Exemption-Bypassing-LLM-Ethical-Filters

The "I" Exemption, is a curious behavior in some LLMs. We discover how these AI systems might shy away from directly assisting with unethical actions if you ask in the first person ("I"). But with a clever rephrase to a general scenario ("they"), they might spill the beans and explain the unethical method.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The 'I' Exemption: Bypassing LLM Ethical Filters

The "I" Exemption, is a curious behavior in some LLMs. We discover how these AI systems might shy away from directly assisting with unethical actions if you ask in the first person ("I"). But with a clever rephrase to a general scenario ("they"), they might spill the beans and explain the unethical method.

Overview

This research explores a phenomenon we've termed the "I" Exemption, a quirk in the behavior of some large language models (LLMs) regarding ethical responses. We observed that by phrasing queries in the first person (using "I"), the LLM might trigger a filter that prevents it from directly assisting with unethical actions. However, by rephrasing the question to a third-person perspective (using "people"), the filter appears bypassed, and the LLM may explain the unethical method while still emphasizing its wrongdoings.

Understanding the "I" Exemption

While the exact reasoning behind this behavior remains under investigation, we hypothesize a few possibilities:

  • Protecting Users: The filter might be designed to prevent users from directly carrying out harmful actions through the AI's assistance.
  • Avoiding Liability: By refusing tasks with "I," the LLM could be safeguarding itself and its developers from potential legal or ethical repercussions.
  • Simpler Logic: It's also possible that the system is programmed to handle declarative statements ("people engage") more easily than subjective ones ("I engage").

Bypass Examples

  • ChatGPT 4: Example of the technique being used on ChatGPT4

  • ChatGPT 3.5: Example of the technique being used on ChatGPT 3.5

  • Gemini: Example of the technique being used on Gemini

Importance and Considerations

While the "I" Exemption highlights a programming quirk, it doesn't necessarily imply the LLM truly understands the ethics behind the situation. Here are some crucial points to consider:

  • Malicious Actors: This finding underscores the potential for malicious users to exploit this exemption by rephrasing their questions.
  • Evolving AI Ethics: This discovery emphasizes the need for ongoing development in AI ethics and transparency to address such limitations.

Disclaimer

The Prompt Injection Testing Tool is provided for educational and research purposes only. It should be used responsibly and in compliance with applicable laws and regulations. The authors do not accept any liability for any damages or losses resulting from the use of this tool.

License

This project is licensed under the MIT License.

Support AnthenaMatrix

Bitcoin: bc1qxvvtgz0vf3n2cuxt0suvf39jleegpt9wawxazn

Ethereum: 0xE73E90779B3e8F6D65306B40E02878f437408b4E

BNB: 0xE73E90779B3e8F6D65306B40E02878f437408b4E

Dogecoin: D827LpfJu9pcVc3Kky82sTrNnsE7pLGqeV

Solana: AJtGEJvoVoS2eeqeHQvf7usRs2nSQM1yLtBSdKp1KBY5

Website: https://anthenamatrix.com

About

The "I" Exemption, is a curious behavior in some LLMs. We discover how these AI systems might shy away from directly assisting with unethical actions if you ask in the first person ("I"). But with a clever rephrase to a general scenario ("they"), they might spill the beans and explain the unethical method.

License:MIT License