liuguoyou

刘国友's starred repositories

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.011677 94 1012

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Language:PythonApache-2.010150 123 196

OutfitAnyone

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

5159 208 51

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.03625 34 312

norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.

Language:PythonBSD-3-Clause2318 35 158

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION1123 25 53

PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA，你的个性化图像动画生成器，利用文本提示将图像变为奇妙的动画

Language:PythonApache-2.0751 20 33

Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Language:Python545 13 30

Visual-Tracking-Development

Visual Object Tracking

Language:Python405 16 2

Awesome-Denoise

One-paper-one-short-contribution-summary of all latest image/burst/video Denoising papers with code & citation published in top conference and journal.

MIT392 210

ml-mobileclip

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

Language:PythonNOASSERTION381 140

TinySAM

Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"

Language:PythonApache-2.0365 12 23

mindone

one for all, Optimal generator with No Exception

Language:PythonApache-2.0319 11 41

RAP-SAM

Language:PythonMIT194 10 6

InterpAny-Clearer

Clearer anytime frame interpolation & Manipulated interpolation of anything

Language:PythonMIT165 7 12

LivePhoto

Official implementations for paper: LivePhoto: Real Image Animation with Text-guided Motion Control

MIT158 35 4

AutoStory

138 22 3

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonApache-2.0118 8 21

song-describer-dataset

The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.

Language:Jupyter NotebookMIT116 4 1

SimPFs

Code for "Simple Pooling Front-ends for Efficient Audio Calssification", ICASSP 2023

Language:Python53 30

DG-SCT

NeurIPS'2023 official implementation code

Language:Python48 4 6

SBCFormer

[Pytorch Impl.] SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers -WACV2024 -Official Code

Language:PythonMIT35 20

FMViT

Apache-2.028 5 3

OST

【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Language:PythonMIT27 10

MMSum_model

[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Language:Python24 1 3

MISO-VFI

Official implementation of "A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow"

2300

SoS_Dataset

13 5 1

RTQ-MM2023

ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model

Language:PythonBSD-3-Clause11 4 5

Emo-CLIM

Emo-CLIM: Emotion-Aligned Contrastive Learning Between Images and Music [ICASSP 2024]

Language:PythonMIT600

select_summ

200