刘国友's starred repositories

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:11677Issues:94Issues:1012

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Language:PythonLicense:Apache-2.0Stargazers:10150Issues:123Issues:196

OutfitAnyone

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:3625Issues:34Issues:312

norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.

Language:PythonLicense:BSD-3-ClauseStargazers:2318Issues:35Issues:158

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1123Issues:25Issues:53

PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画

Language:PythonLicense:Apache-2.0Stargazers:751Issues:20Issues:33

Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Awesome-Denoise

One-paper-one-short-contribution-summary of all latest image/burst/video Denoising papers with code & citation published in top conference and journal.

License:MITStargazers:392Issues:21Issues:0

ml-mobileclip

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

Language:PythonLicense:NOASSERTIONStargazers:381Issues:14Issues:0

TinySAM

Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"

Language:PythonLicense:Apache-2.0Stargazers:365Issues:12Issues:23

mindone

one for all, Optimal generator with No Exception

Language:PythonLicense:Apache-2.0Stargazers:319Issues:11Issues:41
Language:PythonLicense:MITStargazers:194Issues:10Issues:6

InterpAny-Clearer

Clearer anytime frame interpolation & Manipulated interpolation of anything

Language:PythonLicense:MITStargazers:165Issues:7Issues:12

LivePhoto

Official implementations for paper: LivePhoto: Real Image Animation with Text-guided Motion Control

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonLicense:Apache-2.0Stargazers:118Issues:8Issues:21

song-describer-dataset

The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.

Language:Jupyter NotebookLicense:MITStargazers:116Issues:4Issues:1

SimPFs

Code for "Simple Pooling Front-ends for Efficient Audio Calssification", ICASSP 2023

Language:PythonStargazers:53Issues:3Issues:0

DG-SCT

NeurIPS'2023 official implementation code

SBCFormer

[Pytorch Impl.] SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers -WACV2024 -Official Code

Language:PythonLicense:MITStargazers:35Issues:2Issues:0

OST

【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Language:PythonLicense:MITStargazers:27Issues:1Issues:0

MMSum_model

[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

MISO-VFI

Official implementation of "A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow"

Stargazers:23Issues:0Issues:0

RTQ-MM2023

ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model

Language:PythonLicense:BSD-3-ClauseStargazers:11Issues:4Issues:5

Emo-CLIM

Emo-CLIM: Emotion-Aligned Contrastive Learning Between Images and Music [ICASSP 2024]

Language:PythonLicense:MITStargazers:6Issues:0Issues:0
Stargazers:2Issues:0Issues:0