tbergman's starred repositories
detectron2
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
pixel2style2pixel
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
awesome-openai-vision-api-experiments
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
grounded-segment-anything-colab
Grounding DINO with Segment Anything & Stable Diffusion colab
Anything2Image
Generate image from anything with ImageBind and Stable Diffusion
MM-Navigator
LMMs as Smartphone Agents
shapeunity
Learning to Reconstruct 3D Manhattan Wireframes from a Single Image
pointingqa
Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"
audio-retrieval-plugin
FiftyOne Plugin for searching images by audio clip using ImageBind and Qdrant
video-to-frames
Split videos into frames