π¨βπ©βπ§βπ¦ MA Adaptive Project Record
jc-bao opened this issue Β· comments
Chaoyi Pan commented
Research Problem
- Adapt to other policies (like cooperative lifting)
- Share environment information (partial observation, physical states, interactions between two drones. )
Chaoyi Pan commented
Week1
Investigate the research problem.
-
Key results:
- Engineering: Implement dual quadrotor rigid link transportation
- Engineering: Train the environment with centralized policy
-
Progress
bzx20 commented
Literature Review
Learning Vision-based Pursuit-Evasion Robot Policies
Basics
- Addressing the complex task of learning strategic robot behavior, particularly in pursuit-evasion interactions under real-world constraints.
- Supervised learning: a fully-observable robot policy provides supervision for a partially-observable one.
- The quality of the supervision for the partially-observable pursuer policy is found to depend on two critical factors: achieving a balance between diversity and optimality in the evader's behavior and considering the strength of modeling assumptions in the fully-observable policy.
Details
- Fully-Observable Policy: future trajectory-->latent intent, together with relative state to produce
$\pi^*$ - Partially-Obserable Policy: estimate&action history-->imitate latent intent, together with estimate to produce
$\pi^p$
bzx20 commented
Learning Vision-based Pursuit-Evasion Robot Policies
Evader Policy
- Random policy
Motion primitives MP are the Cartesian product of regularly discretized linear and angular velocities - MARL policy
$\pi^*(x^{rel},z_t)$
The evader trains against a pre-trained, fully-observable pursuer policy( use a curriculum set at where at each fixed iteration, the pursuer speed in increased)
bzx20 commented
bzx20 commented
bzx20 commented
RMA Method Applied to Dualquad2d Env
Record some revision:
render_fn
to render the dualquad2d env, reviseenv.reset
,controller.update_params
get_obs_paramsonly()
TODO: How to include the future state? In the form of a future trajectory?@partial(jax.jit, static_argnums=(0,)) def get_obs_paramsonly(self, state: EnvStateDual2D, params: EnvParamsDual2D) -> chex.Array: ### TO BE REVISED obs_elements = [ jnp.array( [ # mass # (params.m - params.m_mean)/params.m_std, # # action_scale (params.action_scale - params.action_scale_mean)/params.action_scale_std, # # 1st order alpha # (params.alpha_bodyrate - params.alpha_bodyrate_mean)/params.alpha_bodyrate_std, # object mass (params.mo - params.mo_mean)/params.mo_std, # rope length (params.l - params.l_mean) / params.l_std, ] ) ] # tmp:3 obs = jnp.concatenate(obs_elements, axis=-1) return obs
Now I have cut some of the params. And this is to be revised.