dolortaste

0

followers

following

stars

dolortaste's starred repositories

fanqiang

翻墙-科学上网

Language:Kotlin3769200

ControlNet

Let us control diffusion models!

Language:PythonApache-2.02914200

pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Language:PythonApache-2.0126500

ASL

Official Pytorch Implementation of: "Asymmetric Loss For Multi-Label Classification"(ICCV, 2021) paper

Language:PythonMIT71100

QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

Language:PythonNOASSERTION18500

MMSum_model

[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Language:Python2800

Visionary-Vids

Multi-modal transformer approach for natural language query based joint video summarization and highlight detection

Language:Jupyter NotebookNOASSERTION1100

HiREST

Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)

Language:PythonMIT8700

MultiTaskModel

multi task mode for esmm and mmoe

Language:Python11800

roboflow-100-benchmark

Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets

Language:Jupyter NotebookMIT23500

GLIP

Grounded Language-Image Pre-training

Language:PythonMIT208600

CVinW_Readings

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

reclip

Language:PythonApache-2.07700

paper-reading-note

和李沐一起读论文

Apache-2.09200

awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

MIT569400

read-DN-DETR

Language:PythonApache-2.0500

paper-reading

深度学习经典、新论文逐段精读

Apache-2.02508300

GLIGEN

Open-Set Grounded Text-to-Image Generation

Language:PythonMIT192200

SimREC

A lightweight codebase for referring expression comprehension and segmentation

Language:PythonApache-2.04900

FIBER

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Language:PythonMIT12600

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.01424900

awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)

OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Language:PythonApache-2.061800

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Language:PythonApache-2.0209600

RefTR

Official Implementation for paper "Referring Transformer: A One-step Approach to Multi-task Visual Grounding" Neurips 2021

Language:PythonMIT6400

DQ-DETR

[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

5200

UNINEXT

[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval

Language:PythonMIT147200

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonMIT629200

Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Language:PythonApache-2.0304700

OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Language:PythonApache-2.0237600