tbergman

followers

following

stars

tbergman's starred repositories

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Language:PythonApache-2.029101 384 3464

moviepy

Video editing with Python

Language:PythonMIT11974 253 1467

GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.05306 36 274

layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

Language:PythonApache-2.04573 71 145

notebooks

Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.

Language:Jupyter Notebook4410 68 120

pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework

Language:Jupyter NotebookMIT3140 63 316

DIG

A library for graph deep learning research

Language:PythonGPL-3.01804 30 204

awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Language:Python1591 26 5

OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023

Language:Jupyter NotebookMIT1365 20 105

multimodal-maestro

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

Language:PythonMIT974 14 7

GPT-4V-Act

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

Language:JavaScript900 21 10

lcnn

LCNN: End-to-End Wireframe Parsing

Language:PythonMIT481 17 72

examples

Example code and applications for machine learning on Graphcore IPUs

Language:PythonMIT313 44 3

grounded-segment-anything-colab

Grounding DINO with Segment Anything & Stable Diffusion colab

Language:Jupyter NotebookUnlicense189 7 6

Anything2Image

Generate image from anything with ImageBind and Stable Diffusion

Language:Jupyter Notebook184 7 14

nx-guides

Examples and IPython Notebooks about NetworkX

Language:PythonCC0-1.0180 24 35

ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language:PythonApache-2.0179 5 16

neurvps

Neural Vanishing Point Scanning via Conic Convolution

Language:PythonMIT174 8 24

MM-Navigator

LMMs as Smartphone Agents

nerd

NeRD: Neural 3D Reflection Symmetry Detector

Language:PythonMIT99 7 10

holicity

HoliCity: A City-Scale Data Platform for Learning Holistic 3D Structures

Language:PythonNOASSERTION87 10 15

SoM

Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️

Language:Jupyter Notebook74 30

SoM-LLaVA

Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.

Language:Python6900

shapeunity

Learning to Reconstruct 3D Manhattan Wireframes from a Single Image

Language:PythonMIT67 4 11

vecui

Tiny, ergonomic and fun vector library for UI engineers.

Language:TypeScript32 1 6

GPT-4V-AD

Code for "Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection"

Language:Python21 1 3

pointingqa

Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"

Language:Python18 3 1

cookbooks

Templates for computer vision projects, referenced in Roboflow blog posts.

Language:Python15 8 2

audio-retrieval-plugin

FiftyOne Plugin for searching images by audio clip using ImageBind and Qdrant

Language:TypeScript8 20

video-to-frames

Split videos into frames

Language:PythonMIT300