A code copied from google-research which named motion-imitation was rewrited with PyTorch. More details can get from this project.
GIthub Link:https://github.com/google-research/motion_imitation
Project Link:https://xbpeng.github.io/projects/Robotic_Imitation/index.html
For training:
python motion_imitation/run_torch.py --mode train --motion_file 'dog_pace.txt|dog_spin.txt' \
--int_save_freq 10000000 --visualize --num_envs 50 --type_name 'dog_pace'
- mode: train or test
- motion_file: Chose which motion to imitate (ps: | is used to split different motion)
- visualize: Whether rendering or not when training
- num_envs: Number of environments calculated in parallel
- type_name: Name of model file
For testing:
python motion_imitation/run_torch.py --mode test --motion_file 'dog_pace.txt' --model_file 'file_path' \
--encoder_file 'file_path' --visualize
- file_path: There's a model parameters zip file, you just find out and copy it's path.
In this project, I donot use Gaussian distribution to fitting the encoder rather by using a mlp network with one hidden layer. The encoder loss function is -torch.sum(F.softmax(latent_param, dim=0) * advantages.reshape(-1, 1), dim=1).max(). Final loss function is policy + γ * encoder with optimized by Adam synchronously. Because there's no real robot, I do not transfer it to real world for testing.
For multi-motion skills learning, I do the one-hot encode for each motion as the input of policy network. Meanwhile I use a classifier mlp network to classifiy the motion responding to the output of policy net. And the classifier loss function is cross entropy —— cross_entropy(pre_motion_id, motion_id).
The whole loss function is : alpha * cross_entropy(pre_motion_id, motion_id) + (1 - alpha) * regul_term
Part Ⅰ is used to conform that agent can learn current motion
Part Ⅱ is used to conform that agent can act previous motion which had learned
And for more details about regul term, please check the original paper.