couldbejake / ImageLocSearchGPTVision

A "binary-search-ish" setup for navigating an image using GPT Vision.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

So far this method doesn't work:

If you want to work on this.

  • Start from the center of a mobile phone screen. while cursor is not at button position:
    • overlay a cursor ontop of the image
    • Ask GPT Left, Right, Up or Down
    • ex. GPT says Left
    • Go half way left
    • overlay a cursor ontop of the image
    • Ask GPT Left, Right, Up or Down
    • ex. GPT Says Right
    • Go half of the half way right
    • position found!

but this one does:

simpler solution (void and null), split the image into sections. super impose a red grid over the image, and ask gpt which square the item is in. Possibly use a combination of both to get accurate results.

logo

logo

logo

logo

logo

logo

logo

About

A "binary-search-ish" setup for navigating an image using GPT Vision.