xinyu1205

Marrying Grounding DINO with Segment Anything & Tag2Text & Stable Diffusion & BLIP & Whisper - Automatically Recognize, Detect, Segment and Generate Anything with Image, Text, and Speech Inputs

Language:Jupyter NotebookApache-2.0010

GroundingDINO

The official implementation of "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.0010

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonMIT020

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.0000

MiniGPT-4

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

Language:PythonBSD-3-Clause010

moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722

Language:PythonNOASSERTION020

object_detection_metrics

Object Detection Metrics

MIT000

query2labels

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Language:PythonMIT010

rentainhe

010

ssl-small

Code implementation for paper "On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals".

Language:PythonNOASSERTION020

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0010

txsun1997.github.io

Language:HTML020

xinyu1205

020