penetration-testing proximal-policy-optimization pytorch reinforcement-learning reinforcement-learning-agent nasim

CLAP: Curiosity-Driven Reinforcment Learning Automatic Penetration Testing Agent

CLAP is a reinforcement learning PPO agent performs Penetration Testing in simulated computer network environment (we use Network Attack Simulator (NASim)). The agent is trained to scan for vulnerabilities in the network and exploit them to gain access to various network resources. CLAP was initially poposed in our paper Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach.

Simulated Network Enviornment: Network Attack Simulator (NASim)

Network Attack Simulator (NASim) is a simulated computer network complete with vulnerabilities, scans and exploits designed to be used as a testing environment for AI agents and planning techniques applied to network penetration testing.

However, compared to the original paper, this repo has made following changes

Developed based on CleanRL
Add LSTM for POMDP scenarios
- As Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs
To Support NASim 2D observation space, Transformer was implementated as preceptions
- However, they are extremely unstable to train
- To learn more about the transformer enocder: Check Yekun's Note

Prerequisites:

To run this code, you will need to have the following installed on your system:

Python 3.5 or later
Pytorch 2.0 or later
OpenAI Gym 0.21.0 (huge change 0.25)
NASim 0.91

It's important to be aware that OpenAI Gym underwent a significant update after version 0.25.0, which included a new step API.

Get Started

Use Conda to manage python environmnent and Poetry to manage packages. Get Started

Clone this repo:

git clone https://github.com/yyzpiero/RL4RedTeam.git

Create conda environment:

conda create -p ./venv python==X.X

and use poetry to install all Python packages:

poetry install

Train the agent

To train the agent, you can use the following command:

cd ./algo
python clap.py

This will start the training process, which will run until the agent reaches a satisfactory level of performance. The performance of the agent will be printed to the console at regular intervals, so you can monitor its progress.

Contributing

The ppo implementation is heavily based on Costa Huang's fantasitc library CleanRl

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Citing `CLAP`

@article{yang2022behaviour,
  title={Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach},
  author={Yang, Yizhou and Liu, Xin},
  journal={arXiv preprint arXiv:2202.10630},
  year={2022}
}

TODOs

Add origin code for CLAP
Add Random Network Distillation (RND)
Include figures of the training results

Limitations

This implementation of the PPO algorithm is not intended for use in real-world penetration testing. It is only meant for use in a simulated environment, and should not be used to perform actual penetration testing on real networks.

About

A PPO agent leveraging reinforcement learning performs Penetration Testing in a simulated computer network environment. The agent is trained to scan for vulnerabilities in the network and exploit them to gain access to various network resources.