This project demonstrates how to use EntityGym Rust to add a deep neural network opponent to Marcus Buffett's snake clone. Play the web build here (if keyboard isn't working, click on the game).
bevy-snake-ai.mp4
Control the blue snake with WASD or the arrow keys. Be the first snake to reach to reach a length of 10.
To run the native app, clone the repo and run:
cargo run --release
Link to web build: https://cswinter.github.io/bevy-snake-ai/
This section gives a brief overview of some AI-related implementation details. It assumes familiarity with the EntityGym snake tutorial.
In our previous implementation, we had a single agent playing the game.
Here, to allow the user to play against the AI, we train with two agents.
So we replace our struct Player(Box<dyn Agent>)
resource with a struct Players([Option<Box<dyn Agent>>; 2])
.
During training, both of the players are Some
train agent, each controlling one of the two snakes.
When running the game interactively, the first of the players is set to None
and the first snake is controlled by user input.
To set up training with self-play between two agents, we simply change the function signature of run_headless
:
- pub fn run_headless(_: python::Config, agents: TrainAgent, seed: u64)
+ pub fn run_headless(_: python::Config, agents: [TrainAgent; 2], seed: u64)
In src/python.rs
, we can then use the TrainEnvBuilder::build_multiagent
instead of TrainEnvBuilder::build
to create a Python training environment with multiple training agents.
When training with a single agent, we can use the AgentOps::act
method to get the action from the agent.
However, act
will block until all agents have made their move, which would cause a deadlock when we have multiple agents.
Instead, we first call the nonblocking AgentOps::act_async
for every agent to get an ActionReceiver
that we can later call recv
on to await the action.
If we allow the AI to instantly take an action on every frame, it plays much too quickly for humans to keep up.
To simulate human reaction speeds, we introduce a delay to all actions by the AI.
We do this by adding an action_queue: VecDeque<Direction>
field to the SnakeHead
entity which will queue up actions to be taken on future frames.
When the AI takes an action, instead of applying it immediately, we instead push it to the back of the queue.
Once the queue reaches the maximum length, we pop and apply the action at the front the queue.
head.action_queue.push_back(dir);
if head.action_queue.len() >= head.action_delay {
let dir = head.action_queue.pop_front().unwrap();
if dir != head.direction.opposite() {
head.direction = dir;
}
}
The AI has no knowledge of anything other than the information we pass to it on each frame.
To allow the AI to take into account its pending actions, we add array with all outstanding actions to the Head
entity:
#[derive(Featurizable)]
pub struct Head {
x: i32,
y: i32,
is_enemy: bool,
action_delay: u64,
action_queue: [QueuedMove; 3],
}
#[derive(Featurizable)]
pub enum QueuedMove {
Up,
Down,
Left,
Right,
None,
}
One of the advantages of using deep reinforcement learning is that we can easily create a series of opponents with a wide range of skill levels just by varying the amount time the AI is trained for.
The different opponents are stored in a struct Opponents(pub Vec<RogueNetAgent>)
resource, and every time the player wins/loses a game, we increment/decrement the level counter that determines which opponent to use to control the AI.
The entity-gym-rs library has an integration with Bevy's asset loading system which can be enabled by setting the "bevy" feature in Cargo.toml
:
[dependencies]
entity-gym-rs = { version = "0.4.0", features = ["bevy"] }
We register a RogueNetAsset
and RogueNetAssetLoader
and add a load_agents
startup system to load the model checkpoints from the assets/agents
directory.
pub struct OpponentHandles(pub Vec<Handle<RogueNetAsset>>);
App::new()
.add_asset::<RogueNetAsset>()
.init_asset_loader::<RogueNetAssetLoader>()
.insert_resource(OpponentHandles(vec![]))
.add_startup_system(load_agents);
fn load_agents(mut opponent_handles: ResMut<OpponentHandles>, server: Res<AssetServer>) {
opponent_handles.0 = [
"16m2ad", "32m2ad", "64m2ad", "128m2ad", "256m2ad", "512m2ad",
]
.iter()
.map(|name| server.load(&format!("agents/{}.roguenet", name)))
.collect();
}
The AssetServer
returns handles that will eventually resolve to the actual assets, so we also have some additional code in the snake_movement_agent
system to check if the assets are loaded and use them as the opponents:
if opponents.0.is_empty() && opponent_handles.0.iter().all(|h| assets.get(h).is_some()) {
for handle in opponent_handles.0.iter() {
let net = assets.get(handle).unwrap().agent.clone();
opponents.0.push(net);
}
println!("Loaded all opponents {:?}", opponents.0.len());
}