Fine-tuning

Question

Fine-tuning

jucamohedano opened this issue 2 years ago · comments

Hi,

I have been successful picking up objects with a gripper different from the panda robot using the weights from scene_test_2048_bs3_hor_sigma_001, but I would like to attempt fine-tuning for the gripper I'm using with those pretrained weights. I have also attempted training as you explain in the readme, but I could only do it with a V100 GPU 16GB, which is smaller in terms of size to what you suggest. The result from training is not good, the confidence of the grasps doesn't meet the threshold set (high=0.25 and low=0.2), and if I visualize them on the test data without using the threshold they look all over the place, only a few of them look okay. I believe that fine-tuning will require much less computational resources. I think I could do this by setting the optimizer to only apply the gradient to the layers I want, which are the last fully connected layers, but let me know if there is a better way please. So, when I run train.py with ckpt_dir='checkpoints/scene_test_2048_bs3_hor_sigma_001' and I get the following error:

  File "contact_graspnet/train.py", line 225, in <module>
    train(global_config, ckpt_dir)
  File "contact_graspnet/train.py", line 78, in train
    loss_ops = load_labels_and_losses(grasp_estimator, contact_infos, global_config)
  File "/home/juancm/contact_graspnet/contact_graspnet/tf_train_ops.py", line 86, in load_labels_and_losses
    tf_pos_finger_diffs, tf_scene_idcs = load_contact_grasps(contact_infos, global_config['DATA'])
  File "/home/juancm/contact_graspnet/contact_graspnet/tf_train_ops.py", line 254, in load_contact_grasps
    tf_pos_contact_points = tf.constant(np.array(pos_contact_points), tf.float32)
  File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 161, in constant_v1
    allow_broadcast=False)
  File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 300, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/root/miniconda3/envs/contact_graspnet_env/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 522, in make_tensor_proto
    "Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Do you think that this is due to my hardware (3060 RTX and RAM 32GB)? I wasn't planning to train on my laptop, but at least checking that the training works before doing it on a server.

Thank you! :)

Martin Sundermeyer · Answer 1 · Tue May 10 2022 02:25:36 GMT+0800 (China Standard Time)

Hi @jucamohedano ,

sorry for the late answer. I used a rather simple but efficient approach by loading all contact points to GPU memory as a tf.constant. However, the size of a single tensor is limited to 2GB in TF. I guess the problem is that your finetuning dataset is larger than my dataset with more contact points so this 2GB size is exceeded. You could try to cast it to tf.float16, or load the contact data not all at once but in chunks.