LeCAR-Lab / CoVO-MPC

Official implementation for the paper "CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design" accepted by L4DC 2024. CoVO-MPC is an optimal sampling-based MPC algorithm.

Home Page:https://lecar-lab.github.io/CoVO-MPC/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ MA Adaptive Project Record

jc-bao opened this issue Β· comments

Research Problem

  • Adapt to other policies (like cooperative lifting)
  • Share environment information (partial observation, physical states, interactions between two drones. )

Week1

Investigate the research problem.

  • Key results:

    • Engineering: Implement dual quadrotor rigid link transportation
    • Engineering: Train the environment with centralized policy
  • Progress

commented

Literature Review

Learning Vision-based Pursuit-Evasion Robot Policies

Basics

  • Addressing the complex task of learning strategic robot behavior, particularly in pursuit-evasion interactions under real-world constraints.
  • Supervised learning: a fully-observable robot policy provides supervision for a partially-observable one.
  • The quality of the supervision for the partially-observable pursuer policy is found to depend on two critical factors: achieving a balance between diversity and optimality in the evader's behavior and considering the strength of modeling assumptions in the fully-observable policy.

Details

  • Fully-Observable Policy: future trajectory-->latent intent, together with relative state to produce $\pi^*$
  • Partially-Obserable Policy: estimate&action history-->imitate latent intent, together with estimate to produce $\pi^p$
commented

Learning Vision-based Pursuit-Evasion Robot Policies

Evader Policy

  • Random policy
    Motion primitives MP are the Cartesian product of regularly discretized linear and angular velocities
  • MARL policy
    $\pi^*(x^{rel},z_t)$
    The evader trains against a pre-trained, fully-observable pursuer policy( use a curriculum set at where at each fixed iteration, the pursuer speed in increased)

RMA Method Applied to Dualquad2d Env

  • TODO: Replace the observation with actions of the other agent; Write eval functions
  • Current result: Based on simple implementation of RMA
    image

RMA Method Applied to Dualquad2d Env

Render results update
run command python train.py --env dualquad2d --RMA
anim

RMA Method Applied to Dualquad2d Env

Record some revision:

  • render_fn to render the dualquad2d env, revise env.reset, controller.update_params
  • get_obs_paramsonly() TODO: How to include the future state? In the form of a future trajectory?
    @partial(jax.jit, static_argnums=(0,))
        def get_obs_paramsonly(self, state: EnvStateDual2D, params: EnvParamsDual2D) -> chex.Array:
            ### TO BE REVISED
            obs_elements = [
                jnp.array(
                    [
                        # mass
                        # (params.m - params.m_mean)/params.m_std,
                        # # action_scale
                        (params.action_scale - params.action_scale_mean)/params.action_scale_std,
                        # # 1st order alpha
                        # (params.alpha_bodyrate - params.alpha_bodyrate_mean)/params.alpha_bodyrate_std,
                        # object mass
                        (params.mo - params.mo_mean)/params.mo_std, 
                        # rope length
                        (params.l - params.l_mean) / params.l_std, 
                    ]
                )
            ]  # tmp:3
            obs = jnp.concatenate(obs_elements, axis=-1)
            return obs
    

Now I have cut some of the params. And this is to be revised.

  • About future state
    What kind of predefined policy?
    Current: random for test, obviously bad results
    Possible reference: random, MARL, game theory
    policy