This repository is a Gym environment for Webots drone scene focused on UAV navigation research. The current research project is focused in fire emergency outdoor simulated scenarios, based on previous Gym wrapper for a forest fire scene. The environment is intended to train a Reinforcement Learning agent to control a drone under a fire emergency context. In order to make the drone flight, the algorithm must be capable to work under a continuous domain for both action and state space. The agent’s state can be represented by an image, a vector state, or both from which must decide the action to take. The mission of the agent is to approach to the fire location keeping a safe distance.
- 2024/02/09 [Major fixes] v1.2: update reward values
$\in [-1, 0]$ and adding distance difference as a boolean factor - 2024/01/30 [Major fixes] v1.1: multi modal approaches support, store data fixes
There are currently two environments which differs in the action’s domain space.
-
webots_drone/DroneEnvContinuous-v0
presents a continuous action space domain composed by a 4-elements vector represented by$\{\phi, \theta, \psi, thrust\}$ corresponding to roll, pitch, and yaw angles, and the altitude desired for the drone.-
$\phi$ is related to the translation in x-axis moving the drone to the left or the right. -
$\theta$ is related to the translation in y-axis moving the drone forward or backward. -
$\psi$ is related to the rotation in z-axis and rotates the drone in counter- or clockwise directions. -
$thrust$ is related to the translation in z-axis and move the drone in up or down.
-
-
webots_drone/DroneEnvDiscrete-v0
is an extension ofDroneEnvContinuous-v0
with a discrete action space domain composed by a 6 actions + 1 no-action posibilities. The action is discretized and the same step logic from the continuous domain is applied. It is considered two actions for each continuous degree-of-freedom and a fixed altituted resulting in:- Action 0: no-action
- Action 1: affects
$\phi$ with$\pi / 12 => [\pi / 12, 0, 0, 0]$ - Action 2: affects
$\theta$ with$\pi / 12 => [0, \pi / 12, 0, 0]$ - Action 3: affects
$\psi$ with$\pi => [0, 0, \pi, 0]$ - Action 4: affects
$\phi$ with$-\pi / 12 => [-\pi / 12, 0, 0, 0]$ - Action 5: affects
$\theta$ with$-\pi / 12 => [0, -\pi / 12, 0, 0]$ - Action 6: affects
$\psi$ with$-\pi => [0, 0, -\pi, 0]$
The observation space is a high-dimensional image, represented by the drone’s
The reward function is the Euclidean distance between the drone’s position and the safe zone edge, calculated in from the fire location (target position). The safe zone edge is defined at the fire location as base, add the radius size is 4 times the fire’s height. This reward function start with a under zero value, and increase while the drone is getting close of the fire location. If this value is great than zero, the episode’s end. The reward function is defined as follows:
The reward signal comprises a two component value of distance and orientation. The value distance is the difference between the target position and the UAV position with two threshold values corresponding to the range are of detection and the nearest it can reach. The following methods implements it:
def compute_direction_vector(position, orientation, ref_position):
# Calculate the vector pointing from the agent position to the target position
vector_to_target = np.array(ref_position) - np.array(position)
# Normalize the vector
norm_vector_to_target = vector_to_target / np.linalg.norm(vector_to_target)
# Calculate the agent's forward vector based on its orientation
agent_forward_vector = np.array([np.cos(orientation),
np.sin(orientation)])
return norm_vector_to_target, agent_forward_vector
def compute_orientation_reward(position, orientation, ref_position):
# Get direction vectors
direction_to_target, agent_forward = compute_direction_vector(
position, orientation, ref_position)
# Calculate cosine similarity between the direction to the target and agent's forward direction
cosine_similarity = np.dot(direction_to_target, agent_forward)
return (cosine_similarity - 1.) / 2.
def compute_distance_reward(position, ref_position, distance_max=50.,
distance_threshold=25., threshold_offset=5.):
curr_distance = compute_distance(position, ref_position)
safety_distance = distance_threshold - threshold_offset / 2
reward = 1 - abs(1 - curr_distance / safety_distance)
reward = max(-1., reward)
if curr_distance < distance_threshold - threshold_offset:
return -1.
return (reward - 1.) / 2.
def sum_and_normalize(orientation_rewards, distance_rewards, distance_diff=1.):
r_distance = (distance_rewards + 1.)
r_orientation = (orientation_rewards + 1.)
r_sum = r_distance * r_orientation * (distance_diff != 0.)
return r_sum - 1.
Additionally some penalties were considered to ensure safety and energy efficiency. A ring zone delimitates the risk area which the UAV must avoid because can suffer damage. Very near to it there is the goal region which is the closest area to reach around the fire. A square area delimitates the allowed flight area to avoid the drone go far away. As safety must be asure to flight far enough of obstacles, adding a penalization if is near of any object or if collided with it. The following method implements it.
def __compute_penalization(self, info, curr_distance):
near_object_threshold = [150 / 4000, # front left
150 / 4000, # front right
150 / 3200, # rear top
150 / 3200, # rear bottom
150 / 1000, # left side
150 / 1000, # right side
150 / 2200, # down front
150 / 2200, # down back
30 / 800] # top
penalization = 0
penalization_str = ''
# object_near
if any(check_near_object(info["dist_sensors"],
near_object_threshold)):
logger.info(f"[{info['timestamp']}] Warning state, ObjectNear")
penalization -= 10
penalization_str += 'ObjectNear|'
# outside flight area
if any(check_flight_area(info["position"], self.flight_area)):
logger.info(f"[{info['timestamp']}] Warning state, OutFlightArea")
penalization -= 10
penalization_str += 'OutFlightArea|'
# is_collision
if any(check_collision(info["dist_sensors"])):
logger.info(f"[{info['timestamp']}] Warning state, Near2Collision")
penalization -= 10
penalization_str += 'Near2Collision|'
# risk zone trespassing
if curr_distance < self.sim.risk_distance:
logger.info(f"[{info['timestamp']}] Warning state, InsideRiskZone")
penalization -= 10
penalization_str += 'InsideRiskZone'
if len(penalization_str) > 0:
info['penalization'] = penalization_str
return penalization
If any penalization occurs, the penalty score is inmediately returned, on the contrary the computed distance and orientation signal is returned instead. Each value is normalized
def compute_reward(self, obs, info):
"""Compute the distance-based reward.
Compute the distance between drone and fire.
This consider a risk_zone to 4 times the fire height as mentioned in
Firefighter Safety Zones: A Theoretical Model Based on Radiative
Heating, Butler, 1998.
:param float distance_threshold: Indicate the acceptable distance
margin before the fire's risk zone.
"""
info['penalization'] = 'no'
info['final'] = 'no'
# terminal states
discount, end = self.__is_final_state(info)
if end:
self._end = end
return discount
# 2 dimension considered
if len(self.last_info.keys()) == 0:
uav_pos_t = info['position'][:2] # pos_t+1
else:
uav_pos_t = self.last_info['position'][:2] # pos_t
uav_pos_t1 = info['position'][:2] # pos_t+1
uav_ori = info['north_rad']
target_xy = self.sim.get_target_pos()[:2]
# not terminal, must be avoided
goal_distance = compute_distance(uav_pos_t1, target_xy)
penalization = self.__compute_penalization(info, goal_distance)
if penalization != 0:
return penalization
# compute reward components
orientation_reward = compute_orientation_reward(uav_pos_t, uav_ori,
target_xy)
distance_reward = compute_distance_reward(
uav_pos_t, target_xy, distance_max=50.,
distance_threshold=np.sum(self._goal_threshold),
threshold_offset=self._goal_threshold[1])
distance_diff = compute_distance(uav_pos_t, uav_pos_t1)
reward = sum_rewards(orientation_reward, distance_reward,
distance_diff=distance_diff)
return reward
In order to use this environment you require to download and install the Webots simulation first. Then you must clone the repository and install the repository via pip.
git clone https://github.com/angel-ayala/gym-webots-drone.git
cd gym-webots-drone
pip install -e .
Then open from Webots open the world scene located in worlds/forest_tower.wbt
. If no error are shown in the Webots console, then you can test the scene connection. Inside your local copy of this repository do
python webots_drone/webots_simulation.py
You should be able to control the drone in the Webots scene with w,a,s,d
and arrows keys, more details should appeared in the Webots console.
import gym
import time
env = gym.make('webots_drone:webots_drone/DroneEnvDiscrete-v0',
time_limit_seconds=60, # 1 min
max_no_action_seconds=5, # 30 sec
frame_skip=25, # 200 ms # 125, # 1 sec
goal_threshold=25.,
init_altitude=25.,
altitude_limits=[11, 75],
is_pixels=False)
print('reset')
env.reset()
for _ in range(1250):
env.render()
action = env.action_space.sample()
print('action', action)
env.step(action) # take a random actionenv.close()
This environment is an interface for the Webots simulation software version R2023b, and you should be able to run the gym environment after downloading and installing the simulation software version. One main aspects to highlight is to ensure that webots environment variables are set such as WEBOTS_HOME
and LD_LIBRARY_PATH
. In linux can be setted with:
export WEBOTS_HOME=/path/to/webots
export LD_LIBRARY_PATH=$WEBOTS_HOME/lib/controller
In order to check:
echo $WEBOTS_HOME # must show /path/to/webots
echo $LD_LIBRARY_PATH # must show /path/to/webots/lib/controller
Finally you can execute your code that implement this environment as usual.
A simulated Webots scene with a first approach of fire simulation through the FireSmoke
PROTO file. In this scene the DJI Mavic 2 Pro drone is available to control, variating the roll, pitch and yaw angles, and the altitude. The scene is intended to run automously from the Webots interface using a Robot node with the supervisor option set to TRUE
and with the controller set to ‘<extern>
’. The WebotsSimulation
class is configured to run from a terminal, being capable to:
- Randomize the size and location of the FireSmoke node.
- Start the simulation.
- Acquire, and optionally show the image from the drone’s camera.
- Control the drone from keyboard inputs.
- Stop and restart the simulation.
This PROTO
node is implemented as a Robot node, using two Display
nodes for the fire and smoke image. The FireMovement
controller manage this nodes using two sprite cheet images to simulate the movement of the fire and the smoke in a low resolution. A safe distance from the fire was implemented using a first approach as presented by Butler[1] which define the safe distance as 4 times the fire’s height. To achieve this, the fire location is used as base with a radius heat size 4 times the fire’s height. If the drone exceed this point is considered a final state and the episode will be restarted. This node still need consider more realistic aspect such as heat propragation and wind resistance.
TODO
- Fire and Smoke movement.
- Safe distance of heat.
- Smoke cloud.
- Heat propagation.
The Mavic 2 Pro control is achieved by the DroneController
class which have a two-way communication with WebotsSimulation
using an Emitter
and Receiver
Webots node at each side. This setup is intended to simulated a considered as a ground station radio control from the
tower, from where the drone will receive angles and thrust disturbances along communicates its current sensor values.
All the sensors and actuators of the drone are managed by the Drone
class, which instantiate and controls all the drone’s sensors and actuators. The velocity of each propeller motor is calculated using four PID
controlles, one for the roll, pitch and yaw angles, and another for the altitude. The controllers were tunned using the Ziegler-Nichols PID tunning technique.
IMPORTANT! the controllers PID values were tunned for a 8ms simulation timestep, with a configured defaultDamping
WorldInfo
’s field using a Damping node with 0.5 value for the angular and linear field.
The drone’s is equipped with a Camera Node (can be modified in the Webots scene) with an
- [1] Firefighter Safety Zones: A Theoretical Model Based on Radiative Heating, Butler, 1998.