Problems When using CACHE

Question

Problems When using CACHE

Louis-ZhangLe opened this issue 8 months ago · comments

Thanks for your great job. When I was reproducing your work using the CACHE, I could not find the matching hash key under the responses file. Looking forward to your reply, thank you.

Huy Ha · Answer 1 · Wed Jan 24 2024 15:07:36 GMT+0800 (China Standard Time)

Hey! I'm able to reproduce your error. When I rollback to the first commit 3d2f43c, I'm no longer get the same error. My guess is that this bug was introduced in 218a618.

Unfortunately, I don't have time for another week to fix this issue. For now, if you don't need the FR5 robot, could you also use 3d2f43c? Thanks!

Tianle Zhang · Answer 2 · Thu Jan 25 2024 10:46:22 GMT+0800 (China Standard Time)

Thank you for your answer. I'll give it a try first. Also, the reason why I didn't use the OpenAI API is because there were no logprobs in the response. May I ask if it is possible to remove logprobs from the code? Looking forward to your reply, thank you.

Huy Ha · Answer 3 · Thu Jan 25 2024 11:17:48 GMT+0800 (China Standard Time)

Ah right, the OpenAI API removed logprobs recently. Anyways, you should be able to remove the logprobs from the completion sampling procedure without affecting the results too much!

Tianle Zhang · Answer 4 · Wed Jan 31 2024 13:55:48 GMT+0800 (China Standard Time)

Hello, I have successfully run the version you submitted for the first time. I first perform the reproduction work in the transport task. But The training results cannot reach the effect of the paper, and the gap is too big. Can you provide more details on model training, such as training parameters, num_steps_per_update, batch_size and epoch for each domain. Looking forward to your reply, thank you.

Huy Ha · Answer 5 · Wed Jan 31 2024 14:49:18 GMT+0800 (China Standard Time)

Hey! The default training parameters are the ones I used (batch size of 1024, 10 epochs, 1 num steps per update, etc.
How many datapoints was used for training?

Tianle Zhang · Answer 6 · Wed Jan 31 2024 15:15:15 GMT+0800 (China Standard Time)

The datapoints of transport task is 52133. I found that the default value of num_steps_per_update is 10000, which means that the model is updated 10,000 times in an epoch. Are you sure you set it to 1. Moreover, Isn’t it necessary to test during training? Just verification is enough. So the evaluation.num_episodes=0？In addition, I found that inference using diffusion policy is slower and keeps printing some warnings, such as " WARNING Failed to converge after 299 steps: err_norm=0.104888 ". Is this normal? Finally, I would also like to ask about the model name with the best output called "last.ckpt". Why can't I load it? It says there is no such file, but the path is correct. Other checkpoints can be loaded. Looking forward to your reply, thank you.

Tianle Zhang · Answer 7 · Wed Jan 31 2024 16:29:22 GMT+0800 (China Standard Time)

Sorry, I was referring to "num_steps_per_update", not "num_timesteps_per_batch".

OceansDepp · Answer 8 · Thu Jun 06 2024 22:27:31 GMT+0800 (China Standard Time)

Hi! I just rollback to the first commit and download the responses files. But I still have the issue: 'openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = '. ....' I have no idea what might be the reason. Looking forward to the reply from both of you guys, thank you!

Molly · Answer 9 · Tue Jun 18 2024 17:13:54 GMT+0800 (China Standard Time)

Hi! I just rollback to the first commit and download the responses files. But I still have the issue: 'openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = '. ....' I have no idea what might be the reason. Looking forward to the reply from both of you guys, thank you!

Maybe you can check the path saving cache file. Make sure it is scalingup/scalingup/responses.