ChiWeiHsiao / DeepVO-pytorch

Lines 81 to 104 in 14d8790

    
           if i == 0: 
        
           	for pose in batch_predict_pose[0]: 
        
           		# use all predicted pose in the first prediction 
        
           		for i in range(len(pose)): 
        
           			# Convert predicted relative pose to absolute pose by adding last pose 
        
           			pose[i] += answer[-1][i] 
        
           		answer.append(pose.tolist()) 
        
           	batch_predict_pose = batch_predict_pose[1:] 
        
           # transform from relative to absolute  
        
           for predict_pose_seq in batch_predict_pose: 
        
           	# predict_pose_seq[1:] = predict_pose_seq[1:] + predict_pose_seq[0:-1] 
        
           	ang = eulerAnglesToRotationMatrix([0, answer[-1][0], 0]) #eulerAnglesToRotationMatrix([answer[-1][1], answer[-1][0], answer[-1][2]]) 
        
           	location = ang.dot(predict_pose_seq[-1][3:]) 
        
           	predict_pose_seq[-1][3:] = location[:] 
        
           # use only last predicted pose in the following prediction 
        
           	last_pose = predict_pose_seq[-1] 
        
           	for i in range(len(last_pose)): 
        
           		last_pose[i] += answer[-1][i] 
        
           	# normalize angle to -Pi...Pi over y axis 
        
           	last_pose[0] = (last_pose[0] + np.pi) % (2 * np.pi) - np.pi 
        
           	answer.append(last_pose.tolist())

For the section of the code here, what is the significance of checking for i == 0?

Specifically, at line 99 shown below, why only composing the last pose? Since all poses returned by the network are relative poses, shouldn't you compose all the relative poses returned by the network?

DeepVO-pytorch/test.py

Line 99 in 14d8790

last_pose = predict_pose_seq[-1]

@lichunshang this is needed because of sequence overlap. With the default parameters you have sequence length = 6 and overlap = 1. So it looks like this:
Sequence 0 images: 1, 2, 3, 4, 5, 6
Sequence 1 images: 2, 3, 4, 5, 6 ,7
Sequence 2 images: 3, 4, 5, 6, 7, 8

The current implementation takes the whole Sequence 0 (i==0) and then just add the last pose from all the following sequences.

Ahh, I see. Overlap is set to set to sequence length minus one

DeepVO-pytorch/test.py

Line 40 in 14d8790

overlap = seq_len - 1

instead of just one in main.py. For sure an odd way to do this.

DeepVO-pytorch/main.py

Line 28 in 14d8790

    
           train_df, valid_df = get_partition_data_info(partition, par.train_video, par.seq_len, overlap=1, sample_times=par.sample_times, shuffle=True, sort=True)

Thanks @alexart13

	if i == 0:
	for pose in batch_predict_pose[0]:
	# use all predicted pose in the first prediction
	for i in range(len(pose)):
	# Convert predicted relative pose to absolute pose by adding last pose
	pose[i] += answer[-1][i]
	answer.append(pose.tolist())
	batch_predict_pose = batch_predict_pose[1:]

	# transform from relative to absolute

	for predict_pose_seq in batch_predict_pose:
	# predict_pose_seq[1:] = predict_pose_seq[1:] + predict_pose_seq[0:-1]
	ang = eulerAnglesToRotationMatrix([0, answer[-1][0], 0]) #eulerAnglesToRotationMatrix([answer[-1][1], answer[-1][0], answer[-1][2]])
	location = ang.dot(predict_pose_seq[-1][3:])
	predict_pose_seq[-1][3:] = location[:]

	# use only last predicted pose in the following prediction
	last_pose = predict_pose_seq[-1]
	for i in range(len(last_pose)):
	last_pose[i] += answer[-1][i]
	# normalize angle to -Pi...Pi over y axis
	last_pose[0] = (last_pose[0] + np.pi) % (2 * np.pi) - np.pi
	answer.append(last_pose.tolist())

Concatenating poses in test.py