Our goal is to synthesize a programmatic state machine policy from time-series data while simultaneously inferring a set of high-level labels.
Our system is a discrete-time Markov process defined by:
- a high-level label space
$H$ = a set of discrete high-level labels$h \in H$ - Ex:
$h \in$ {ACC, DEC, CON}
- Ex:
- a low-level action space
$L$ = a continuous domain of low-level actions$l \in L$ : controlled joystick directives, motor inputs, etc.- Ex:
$l = a \in \mathbb{R}$ , where$a$ is the acceleration
- Ex:
- a observed state space
$O$ = a continuous domain of observed variables$o \in O$ - Ex:
$o = (x, v) \in \mathbb{R}^2$ , where$x$ is the position and$v$ is the velocity
- Ex:
- an action-selection policy (ASP)
$\pi: H \times O \rightarrow H$ that maps the current high-level label and the current observed variables to the next high-level label - a motor model
$\phi: H \rightarrow L$ that maps the current high-level action to the current low-level action - an extra set of domain-specific constants
$C$ - Ex:
$C$ = {max_velocity, deceleration_value, acceleration_value, target_position}
- Ex:
We define a trajectory
We know the problem domain
We would like to:
- Infer the values of the high-level labels in the demonstrations
- Synthesize an ASP that is maximally consistent with the demonstrations (
$\pi^*$ )
This project is roughly split into the following components:
- data_gen - using simulations to generate demonstrations
- expectation (particle filter)
- maximization (program synthesis)