When training occur deadlock in v-rep

Question

When training occur deadlock in v-rep

RozenAstrayChen opened this issue 4 years ago · comments

Problem describe

Hello I have some problem which is ouccr in training section.
I found that there is a certain chance that UR5 will get stuck after grabbing an object when I train model
and the gripper is always tremble.

train script is use

python main.py --is_sim --push_rewards --experience_replay --explore_rate_decay --save_visualizations

and the screen print

Training iteration: 8126                                                                                                                                                                                           
Change detected: True (value: 281)                                                                                                                                                                                 
Primitive confidence scores: 0.945526 (push), 1.194881 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (8, 152, 120)
Executing: grasp at (-0.484000, 0.080000, 0.051003)
Current reward: 1.000000
Future reward: 1.194881
Expected reward: 1.000000 + 0.500000 x 1.194881 = 1.597440
Training loss: 0.030402
Experience replay: iteration 7085 (surprise value: 1.742928)
Training loss: 0.012920
Grasp successful: True
Time elapsed: 5.005533

Training iteration: 8127
Change detected: True (value: 1609)
Primitive confidence scores: 1.113359 (push), 1.614010 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (12, 104, 103)
Executing: grasp at (-0.518000, -0.016000, 0.041099)
Current reward: 1.000000
Future reward: 1.614010
Expected reward: 1.000000 + 0.500000 x 1.614010 = 1.807005
Training loss: 0.125703
Experience replay: iteration 2556 (surprise value: 0.628174)
Grasp successful: False
Training loss: 0.000300
Time elapsed: 4.954972

Training iteration: 8128
Change detected: True (value: 847)
Primitive confidence scores: 1.120282 (push), 1.313959 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (4, 104, 136)
Executing: grasp at (-0.452000, -0.016000, 0.041861)
Current reward: 0.000000
Future reward: 1.313959
Expected reward: 0.000000 + 0.500000 x 1.313959 = 0.656979
Training loss: 0.144055
Experience replay: iteration 1956 (surprise value: 0.274301)
Training loss: 0.025816
###################################
...deadlock happen...

v-rep screen

System

ubuntu 18
python version is 3.7.7
cuda 10.2
pytorch version 1.6
v-rep version 4.0

qq 2528702283 · Answer 1 · Mon Jun 26 2023 14:04:20 GMT+0800 (China Standard Time)

same problem