NVlabs / contact_graspnet

Can I use two 12G gpu to train this code. I have modified some parts but it only uses one. What should I do?

Currently, you need one GPU with >24GB VRAM because the full grasping data is loaded to VRAM for efficiency reasons.

You can try to move the grasp data to another GPU, here:

contact_graspnet/contact_graspnet/tf_train_ops.py

Lines 249 to 254 in 6ae59a5

    
           with tf.device(device): 
        
               tf_scene_idcs = tf.constant(np.arange(0,len(pos_contact_points)), tf.int32) 
        
               tf_pos_contact_points = tf.constant(np.array(pos_contact_points), tf.float32) 
        
               tf_pos_contact_dirs =  tf.math.l2_normalize(tf.constant(np.array(pos_contact_dirs), tf.float32),axis=2) 
        
               tf_pos_finger_diffs = tf.constant(np.array(pos_finger_diffs), tf.float32) 
        
               tf_pos_contact_approaches =  tf.math.l2_normalize(tf.constant(np.array(pos_approach_dirs), tf.float32),axis=2)

However, it could be slow due to communication overhead between the GPUs. Alternatively, if it is not about reproducing results, use less scenes for training. Or feed the grasp data in batches during training (also has some communication overhead).

Thanks for your answer. Besides this, I also want to ask that:

contact_graspnet/contact_graspnet/scene_renderer.py

Lines 117 to 119 in 6ae59a5

    
           tmesh = obj.mesh 
        
           tmesh_mean = np.mean(tmesh.vertices, 0) 
        
           tmesh.vertices -= np.expand_dims(tmesh_mean, 0)

Why do you transform the vertices？ Is there any advantage?
Also here:

contact_graspnet/tools/create_table_top_scenes.py

Lines 120 to 121 in f28013a

    
           mesh_mean =  np.mean(obj_mesh.vertices, 0, keepdims=True) 
        
           obj_mesh.vertices -= mesh_mean

and

contact_graspnet/tools/create_table_top_scenes.py

Lines 359 to 360 in f28013a

    
           mesh_mean =  np.mean(obj_mesh.vertices, 0, keepdims=True) 
        
           obj_mesh.vertices -= mesh_mean

The original grasp annotations are defined wrt. the object's coordinate system placed at the mean of vertices, so we kept this definition also for the scenes. If you have grasp annotations wrt. another object coordinate frame, you can probably just remove these lines.

Thanks for your time and explanation! I close this issue for now. If I have other questions in the future, I will ask again. Thanks in advance!

	with tf.device(device):
	tf_scene_idcs = tf.constant(np.arange(0,len(pos_contact_points)), tf.int32)
	tf_pos_contact_points = tf.constant(np.array(pos_contact_points), tf.float32)
	tf_pos_contact_dirs = tf.math.l2_normalize(tf.constant(np.array(pos_contact_dirs), tf.float32),axis=2)
	tf_pos_finger_diffs = tf.constant(np.array(pos_finger_diffs), tf.float32)
	tf_pos_contact_approaches = tf.math.l2_normalize(tf.constant(np.array(pos_approach_dirs), tf.float32),axis=2)

	tmesh = obj.mesh
	tmesh_mean = np.mean(tmesh.vertices, 0)
	tmesh.vertices -= np.expand_dims(tmesh_mean, 0)

	mesh_mean = np.mean(obj_mesh.vertices, 0, keepdims=True)
	obj_mesh.vertices -= mesh_mean

Two gpu training