gpt4v

There are 9 repositories under gpt4v topic.

TencentQQGYLab / AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
agent chatgpt generative-ai gpt4 gpt4v llm
Language:Python 5597
X-PLUG / MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
agent gpt4v mllm mobile-agents multimodal multimodal-large-language-models multimodal-agent android app gui mobile automation copilot harmony ios
Language:Python 3633
AmberSahdev / Open-Interface
Control Any Computer Using LLMs.
assistant assistant-computer-control automation gpt gpt4 gpt4v gpt4vision linux llm machine-learning macos openai pyautogui pyinstaller python self-driving self-driving-software windows
Language:Python 1884
reworkd / tarsier
Vision utilities for web interaction agents 👀
gpt4v llms ocr playwright pypi-package python selenium webscraping
Language:Jupyter Notebook 1612
ictnlp / LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
efficient gpt4o gpt4v large-language-models large-multimodal-models llama llava multimodal multimodal-large-language-models video vision vision-language-model visual-instruction-tuning
Language:Python 403
bdekraker / WebcamGPT-Vision
Lightweight GPT-4 Vision processing over the Webcam
chatgpt computer-vision gpt-4 gpt4-api gpt4v openai
Language:JavaScript 282
langgptai / Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
awesome awesome-list chatgpt dall-e dall-e3 dall-e3-prompts gpt4 gpt4v jailbreak-prompt multimodal multimodal-prompts newbing prompt-engineering prompt-injection prompts
246
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
chatgpt eccv2024 gpt gpt-4v gpt4v instruction-tuning language-model large-language-models large-multimodal-models large-vision-language-models vision-language-model
Language:Python 205
vscode-ui-sketcher
pAIrprogio / vscode-ui-sketcher
Draw your projects to life
gpt4v tldraw ui-design vscode-extension
Language:TypeScript 200
amazing-openai-api
soulteary / amazing-openai-api
Convert different model APIs into the OpenAI API format out of the box.
azure-openai azure-openai-api gemini-pro google-gemini openai openai-api yi-34b yi-34b-chat gpt4v gpt4vision
Language:Go 147
zzxslp / MM-Navigator
GPT-4V in Wonderland: LMMs as Smartphone Agents
gpt4v llm-agents web-navigation
Language:Python 133
kyegomez / MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
ai artificial-intelligence gpt4v machine-learning mamba megabyte ml multi-modality tokenizer
Language:Python 114
sketch2app
cameronking4 / sketch2app
The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
gpt4-vision sketch2code wireframe gpt4v design2code pad2pixel sketch2app ai-tool app-maker code-generator gpt4 generate-app-ai code-assistant nextjs openai
79
admineral / GPT4-Vision-React-Starter
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
gpt4 gpt4-api gpt4v openai openaiapi ai chatgpt-api gpt-4-vision-preview openai-api gpt4-vision
Language:TypeScript 75
BUAADreamer / Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
llava medical mllm multimodal chinese qwen1-5 ai gpt4v huggingface-datasets minigpt4 transformers llama-factory
Language:Python 74
roboflow / gpt-checkup
Monitor the performance of OpenAI's GPT O3 Mini model over time.
computer-vision gpt4v model-analysis gpt-o1 o1
Language:HTML 34
limeberri / gpt4v-video-voiceover
Video Voiceover with gpt-4o-mini
gpt4v jupyter-notebook openai python streamlit
Language:Jupyter Notebook 33
reidbarber / webmarker
Mark web pages for use with vision-language models
prompt prompt-engineering som vision-language-model set-of-mark claude gemini gpt4o gpt4v llms playwright qwen-vl operator computer-use computer-using-agent cua
Language:TypeScript 30
Azure-Samples / rag-as-a-service-with-vision
This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.
azure-ai-search azure-ai-vision cosmosdb gpt-4o gpt4v gpt4vision llm openai rag vision
Language:Python 26
neka-nat / mylangrobot
Language instructions to mycobot using GPT-4V
chatgpt gpt4v mycobot segment-anything whisper gpt-4-vision gpt-4-vision-preview
Language:Python 22
logicalroot / gpt-4v-demos
🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup
gpt-4 openai python streamlit gpt-4v gpt4 gpt4v
Language:Python 18
kyegomez / HRTX
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
ai artificial-intelligence ensemble gpt4v machine-learning ml multi-modal multi-modality rt-2 rtx
Language:Python 16
Charmve / gpt-eyes
I GAVE GPT-4 EYES!
agent gpt-4 gpt-4o gpt-4omni gpt4 gpt4v llm llm-inference world-model world-models worldmodel
Language:JavaScript 14
GraphPKU / CoI
Chain of Images for Intuitively Reasoning
chain-of-image chain-of-throught chatbot chatgpt dalle3 gpt4v llama llava multimodal visual-language-models
Language:Python 8
easonlai / webcam_chat_with_aoai_gpt4o
Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!
azure-openai azure-openai-api azureopenai flask flask-web gpt gpt4 gpt4o gpt4v microsoftazure openai opencv-python python python3
Language:HTML 6
elizabethsiegle / stephensmithify-openaivision-sendgrid
Analyze a Video and generate commentary about it with OpenAI's GPT-4V, Text-to-speech, LangChain, Streamlit, Replit, Twilio SendGrid, and OpenCV!
gpt4v langchain openai openai-api opencv-python replit sendgrid streamlit openai-v
Language:Python 5
danomation / Discord-Vision-Bot
poc gpt-4 vision bot
discord gpt4v gpt-4-vision-preview vision openai pycord
Language:Python 4
dceluis / vacocam_render
Vision-Assisted Camera Orientation
artificial-intelligence computer-vision ffmpeg gpt4 gpt4-api gpt4-vision gpt4v
Language:Jupyter Notebook 4
Envedity / DAIA
Digital Artificial Intelligence Agent
agi ai ai-agent llm llm-agent machine-learning ml auto-agent ai-vision-model gpt4v gpt4vision
Language:Python 3
gpt4api9 / gpt4api9
麻雀GPTs-API市场
gpt35turbo gpt4 gpt4all-api gpt4api gpt4v openai
3
yunwoong7 / GPT-4V-Examples
Explore the power of GPT-4V with our curated examples and tutorials. This repository offers code snippets, step-by-step guides, and use case demonstrations for integrating GPT-4V into various applications. Perfect for both AI novices and experts!
examples gpt4v openai python tutorials
Language:Jupyter Notebook 3
ethan-yz-hao / equation-ocr-app
OCR application for converting handwritten equations into LaTeX code using OpenAI's GPT-4V API, with LaTeX renderer for editing and checking (Next.js, Typescript, OpenAI GPT-4V, KaTex, Vercel)
gpt4v katex nextjs openai typescript vercel
Language:TypeScript 1
Ravi-Teja-konda / TunedLlavaDelights
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
chatgpt dalle2 dessert finetuning gpt4 gpt4v llama2 llava multi-modality multimodal nutrition nutrition-information stable-diffusion tranformers vision-language-learning vision-language-model
Language:Python 1
sagentic-ai / cupid
Valentine's Day Cupid Agent
ai chatbot gpt4 gpt4v llm bazed-af
Language:TypeScript 1
yunwoong7 / VisionQuery-GPT-4v
VisionQuery GPT-4v is a cutting-edge tool that combines screenshot-based queries with OpenAI's GPT-4. It enables users to capture screens, ask questions, and receive insightful answers from GPT-4v, revolutionizing digital interaction and understanding.
classification face-recognition gpt4v gpt4vision image-recognition object-detection ocr openai-api python
Language:Jupyter Notebook 1
metatatt / iso_bot
ISO 13485 Sniffer Bot, GPT4V with LlamaIndex embeded in React Bot UI
chatbot gpt4v llamaindex ml nextjs react
Language:TypeScript 0

gpt4v

TencentQQGYLab / AppAgent

X-PLUG / MobileAgent

AmberSahdev / Open-Interface

reworkd / tarsier

ictnlp / LLaVA-Mini

bdekraker / WebcamGPT-Vision

langgptai / Awesome-Multimodal-Prompts

ShareGPT4Omni / ShareGPT4V

pAIrprogio / vscode-ui-sketcher

soulteary / amazing-openai-api

zzxslp / MM-Navigator

kyegomez / MambaByte

cameronking4 / sketch2app

admineral / GPT4-Vision-React-Starter

BUAADreamer / Chinese-LLaVA-Med

roboflow / gpt-checkup

limeberri / gpt4v-video-voiceover

reidbarber / webmarker

Azure-Samples / rag-as-a-service-with-vision

neka-nat / mylangrobot

logicalroot / gpt-4v-demos

kyegomez / HRTX

Charmve / gpt-eyes

GraphPKU / CoI

easonlai / webcam_chat_with_aoai_gpt4o

elizabethsiegle / stephensmithify-openaivision-sendgrid

danomation / Discord-Vision-Bot

dceluis / vacocam_render

Envedity / DAIA

gpt4api9 / gpt4api9

yunwoong7 / GPT-4V-Examples

ethan-yz-hao / equation-ocr-app

Ravi-Teja-konda / TunedLlavaDelights

sagentic-ai / cupid

yunwoong7 / VisionQuery-GPT-4v

metatatt / iso_bot