dolortaste's starred repositories
ControlNet
Let us control diffusion models!
pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
MMSum_model
[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Visionary-Vids
Multi-modal transformer approach for natural language query based joint video summarization and highlight detection
MultiTaskModel
multi task mode for esmm and mmoe
roboflow-100-benchmark
Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
paper-reading-note
和李沐一起读论文
awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
paper-reading
深度学习经典、新论文逐段精读
Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
awesome-detection-transformer
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
Track-Anything
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
Deformable-DETR
Deformable DETR: Deformable Transformers for End-to-End Object Detection.