No grid configuration in PLLaVA
farewellthree opened this issue · comments
farewellthree commented
Hello, thanks for the great work! It seems that the PLLaVA code does not utilize the grid configuration of Dynamic High Resolution from the LLaVA1.6. Is this because it would improve video performance, or is it simply to shorten sequence length and improve efficiency?
farewellthree commented
Looking forward to the response 👀
ermu2001 commented
Hi,
Sorry for the late response.
Not sure whether the grid configuration would improve performance. We did not use dynamic high resolution is to save the computation budget. And also, we choose to fill up context length of the LLM with video frames but not frames' resolution.