microsoft / UFO

A UI-Focused Agent for Windows OS Interaction.

Home Page:https://arxiv.org/abs/2402.07939

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: Does it only work with GPT-Vision? Or can it be made to use other visual-input-accepting models as well, like LLaVA?

dartharva opened this issue · comments