Generative AI has initiated a technological arms race between the creation of hyper-realistic synthetic media and the development of tools to detect it. While much research focuses on the technical aspects of automated detection, a critical gap exists in understanding human perception. How do people judge the authenticity of content when the lines between human and machine-generated text are increasingly blurred?
As identified in our foundational paper, "Blessing or curse? A survey on the Impact of Generative AI on Fake News", there is a pressing need for empirical data to understand how these technologies influence public trust and information integrity. The paper highlights a "notable gap in the literature" concerning the dual-use nature of Generative AI and calls for research to explore both the technological and social countermeasures required to safeguard the information ecosystem.
JudgeGPT is not just a survey; it is a live research platform designed to systematically collect and analyze human judgments on news authenticity. It serves as the practical instrument built to address this identified research gap, providing the crucial data needed to navigate this new information landscape.
To effectively study human perceptions of AI-generated news, a reliable and controllable source of stimuli is required. This project employs a two-part research apparatus, comprising the JudgeGPT
and RogueGPT
repositories, which together form a complete, end-to-end experimental pipeline. This structure ensures methodological rigor by allowing for the systematic generation and evaluation of news content.
The relationship between these projects is not merely collaborative; it is a functional and structured research workflow. RogueGPT
serves as the stimulus generation engine, creating content under controlled experimental conditions. JudgeGPT
is the data collection platform, where human participants evaluate that content, providing the raw data for analysis.
The process flows from controlled generation to human judgment, creating a rich dataset that links specific content characteristics to perception scores:
-
Controlled Stimulus Generation (
RogueGPT
): A researcher utilizes theRogueGPT
interface to generate news fragments. The generation process is highly controlled, using specific variables defined in a configuration file (prompt_engine.json
). These variables include parameters such as news outletStyle
(e.g., 'NYT', 'BILD'),Format
('tweet', 'short article'),Language
('en', 'de'), and the underlyingGeneratorModel
(e.g., 'openai_gpt-4-turbo_2024-04-09'). -
Data Storage (MongoDB): Each generated fragment, along with its full metadata (the parameters used to create it), is stored in a shared MongoDB database. This is handled by the
save_fragment
function withinRogueGPT
's codebase, which uses the PyMongo library to interact with the database. -
Human Data Collection (
JudgeGPT
): A participant accesses theJudgeGPT
survey application. The application retrieves a fragment from the MongoDB collection to present to the user. -
Judgment and Analysis: The participant reads the news fragment and uses sliders to rate its perceived authenticity (Real vs. Fake) and origin (Human vs. Machine). This judgment data is then saved back to the database, creating a comprehensive record that links specific generation parameters to quantitative human perception scores. This closed-loop system allows for robust statistical analysis of the factors that influence the believability of AI-generated text.
Whether you are a curious individual, a fellow researcher, or a developer, there are many ways to engage with the JudgeGPT project. Find the path that's right for you below.
Audience | Primary Goal | Action |
---|---|---|
General Public | Test your ability to spot AI-generated news and contribute to our dataset. | Participate in the Survey Test the React Beta |
Researchers | Understand, cite, or collaborate on this research. | Read the Paper ✉️ Contact Us See Citation |
Developers | Contribute code, fix bugs, or suggest features. | Fork the Repo 🐞 Open an Issue See Contributing Guide |
Are you an expert in AI, policy, or journalism? We are conducting a follow-up study to gather expert perspectives on the risks and mitigation strategies related to AI-driven disinformation. Your insights are invaluable for this research.
Please consider contributing by participating in our 15-minute survey: ➡️ https://forms.gle/EUdbkEtZpEuPbVVz5
- Purpose: This survey explores expert perceptions of generative-AI–driven disinformation for an academic research project.
- Data Use: All responses will be treated as confidential and reported in an anonymised, aggregated format by default. At the end of the survey, you will have the option to be publicly acknowledged for your contribution. All data will be used for academic purposes only.
- Time: Approximately 15 minutes.
This section provides a comprehensive guide for developers and technical users who wish to run, inspect, or contribute to the JudgeGPT project locally.
The project is built with the following components:
- Frontend: The primary interface is a Streamlit application, written in Python, designed for rapid prototyping and data interaction. An experimental port to React is in development for a more robust and scalable user experience.
- Backend: The application logic is written in Python and is contained within the main Streamlit script (
app.py
). - Database: A MongoDB (NoSQL) database is used to store news fragments and user judgments. The dependency on
pymongo[srv]
suggests compatibility with cloud-hosted instances like MongoDB Atlas.
Follow these steps to set up the project on your local machine.
-
Prerequisites
- Python 3.8+
- pip package manager
- Git
-
Clone the Repository Open your terminal and run the following commands:
git clone [https://github.com/aloth/JudgeGPT.git](https://github.com/aloth/JudgeGPT.git) cd JudgeGPT
-
Set Up a Virtual Environment (Recommended) To maintain clean dependencies, it is highly recommended to use a virtual environment.
# For macOS/Linux python3 -m venv venv source venv/bin/activate # For Windows python -m venv venv .\venv\Scripts\activate
-
Install Dependencies Install all required Python packages from the
requirements.txt
file.pip install -r requirements.txt
Key dependencies include
streamlit
,pymongo
, andopenai
. -
Configure Environment Variables The application requires a connection string to a MongoDB database. For local development, it is best practice to manage this secret using an environment variable rather than hardcoding it. You will need to set up a variable (e.g.,
MONGO_CONNECTION_STRING
) with your database URI.
Once the setup is complete, launch the Streamlit application with the following command:
streamlit run app.py
The application will open in your default web browser.
The survey experience can be customized by passing parameters in the URL.
-
Language Support: JudgeGPT automatically detects the user's browser language but can be manually set. Supported languages are English (
en
), German (de
), French (fr
), and Spanish (es
).- Example for German:
https://judgegpt.streamlit.app/?language=de
- Example for German:
-
Age Range: You can filter participants by age using the
min_age
andmax_age
parameters.- Example for ages 15-25:
https://judgegpt.streamlit.app/?min_age=15&max_age=25
- Example for ages 15-25:
JudgeGPT is an actively evolving research platform. The project roadmap is directly guided by the open challenges in the field of AI-driven misinformation, as outlined in our research paper. We welcome collaboration on the following key areas, which transform a list of features into a strategic agenda for advancing scientific understanding.
The current focus is on text-based news. However, the proliferation of "deepfakes" and other synthetic media presents a growing threat. The roadmap includes adding image and video support, allowing participants to evaluate the authenticity of visual content. This directly addresses the challenge of multimedia disinformation and expands the project's scope to cover the creation and detection of synthetic realities.
To keep pace with the "technological arms race," the research must test human perception against an ever-wider array of sophisticated models. This involves deeper integration with RogueGPT
's cross-model generation capabilities, incorporating outputs from models like BERT, T5, and other emerging LLMs. This will create a more challenging and ecologically valid testbed for human judgment.
The project aims to move beyond pure detection toward active mitigation. Future work includes building a content verification layer and integrating with established fact-checking services. This would allow the platform not only to identify potentially false content but also to provide users with corrective information, aligning with psychological "inoculation" strategies against misinformation.
True understanding of the fake news phenomenon requires a global perspective. The planned expansion of localization and multilingual support is not merely about translation. It is about enabling research into how the perception of AI-generated content differs across languages and cultural contexts, a significant and under-explored area of inquiry.
To gather high-quality data at scale, participant engagement is key. The roadmap includes implementing gamification elements (scores, badges), an interactive results dashboard, and personalized feedback mechanisms. These features are designed not just for user enjoyment but to increase participant retention, motivation, and the overall volume and quality of the collected research data.
To serve as a long-term, large-scale public resource, the platform must be robust and scalable. This involves transitioning to a production-grade cloud environment (e.g., Microsoft Azure) and making significant UI/UX enhancements, including the ongoing port to React. This ensures the platform's longevity and its ability to support a growing community of participants and researchers.
If you use JudgeGPT or its underlying research in your work, please cite our foundational paper:
@misc{loth2024blessing,
title={Blessing or curse? A survey on the Impact of Generative AI on Fake News},
author={Alexander Loth and Martin Kappes and Marc-Oliver Pahl},
year={2024},
eprint={2404.03021},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
We welcome contributions from the community! To get involved, please follow these steps:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature
). - Commit your changes (
git commit -m 'Add some AmazingFeature'
). - Push to the branch (
git push origin feature/AmazingFeature
). - Open a Pull Request.
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for full details.
This work would not be possible without the foundational technologies and support from:
- OpenAI for their groundbreaking GPT models.
- Streamlit for enabling the rapid development of our web application.
- MongoDB for robust and scalable database solutions.
- The broader open-source community for providing invaluable tools and libraries.
JudgeGPT is an independent research project and is not affiliated with, endorsed by, or in any way officially connected to OpenAI. The use of "GPT" within the project name is employed in a pars pro toto manner, where it represents the broader class of Generative Pre-trained Transformer models and Large Language Models (LLMs) that are the subject of this research. The project's explorations and findings are its own and do not reflect the views or positions of OpenAI. We are committed to responsible AI research and adhere to ethical guidelines in all aspects of our work.