the processing of objaverse feature
SwimZhang opened this issue · comments
Hi
When I run the blip_oa.py in '3DLanguage_data/ChatCaptioner_based/gen_features/'
The Error: RuntimeError: Input type (unsigned char) and bias type (c10::Half) should be the same.
I try to revise the code: 'output = visual_encoder(image.float())' in 167. Is it OK?
Another one, It seems the number of rendered images should be 8, not 4
When I run the blip_oa.py in '3DLanguage_data/ChatCaptioner_based/gen_features/'
The Error: RuntimeError: Input type (unsigned char) and bias type (c10::Half) should be the same.
I try to revise the code: 'output = visual_encoder(image.float())' in 167. Is it OK?
For the first question, we have already update the script for generating features. #43
Another one, It seems the number of rendered images should be 8, not 4
In our final version, we use 4 images to generate the caption using ChatCaptioner. This produces the best results. However, you can choose any number of viewpoints when generating features. You can modify the theta view in this line:
3D-LLM/3DLanguage_data/ChatCaptioner_based/objaverse_render/render.py
Lines 52 to 63 in 9717617
fixed in beff99e
When generating the 3D features, we used 8 images.