yhzhouowo (Mortyzhou-Shef-BIT)

Mortyzhou-Shef-BIT

Geek Repo

Location:UoS -> NUS & BIT

Home Page:https://mortyzaigc.netlify.app/

Github PK Tool:Github PK Tool

yhzhouowo's starred repositories

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:7265Issues:88Issues:112

ProPainter

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

Language:PythonLicense:NOASSERTIONStargazers:5154Issues:49Issues:77

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:2502Issues:20Issues:696

FeatUp

Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024

Language:Jupyter NotebookLicense:MITStargazers:1303Issues:18Issues:56

ISAT_with_segment_anything

Labeling tool with SAM(segment anything model),supports SAM, sam-hq, MobileSAM EdgeSAM etc.交互式半自动图像标注工具

Language:PythonLicense:NOASSERTIONStargazers:1097Issues:9Issues:151

ODISE

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]

Language:PythonLicense:NOASSERTIONStargazers:831Issues:40Issues:42

ovsam

[arXiv preprint] The official code of paper "Open-Vocabulary SAM".

Language:PythonLicense:NOASSERTIONStargazers:725Issues:14Issues:29

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:713Issues:12Issues:71

multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

Language:PythonLicense:MITStargazers:584Issues:6Issues:75

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

soft-vc

Soft speech units for voice conversion

Language:Jupyter NotebookLicense:MITStargazers:391Issues:12Issues:14

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

VisionMamba

Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images

Language:PythonLicense:MITStargazers:317Issues:6Issues:17

BARTScore

BARTScore: Evaluating Generated Text as Text Generation

Language:PythonLicense:Apache-2.0Stargazers:310Issues:7Issues:44

Awesome-Human-Activity-Recognition

An up-to-date & curated list of Awesome IMU-based Human Activity Recognition(Ubiquitous Computing) papers, methods & resources. Please note that most of the collections of researches are mainly based on IMU data.

License:MITStargazers:227Issues:14Issues:0

PromptSRC

[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".

Language:PythonLicense:MITStargazers:203Issues:5Issues:15

awesome-self-supervised-multimodal-learning

[T-PAMI] A curated list of self-supervised multimodal learning resources.

ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language:PythonLicense:Apache-2.0Stargazers:148Issues:7Issues:11

MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Recent-Image-Quality-Related-Papers

A list of image quality related papers published in top conferences and journals

class-incremental-learning

PyTorch implementation of a VAE-based generative classifier, as well as other class-incremental learning methods that do not store data (DGR, BI-R, EWC, SI, CWR, CWR+, AR1, the "labels trick", SLDA).

Language:PythonLicense:MITStargazers:70Issues:2Issues:5

PEL4VAD

Official code for "Learning Prompt-Enhanced Context features for Weakly-Supervised Video Anomlay Detection"

Language:Jupyter NotebookLicense:MITStargazers:56Issues:4Issues:18

HammerLLM

1.4B sLLM for Chinese and English - HammerLLM🔨

Language:PythonLicense:MITStargazers:42Issues:4Issues:1

Multimodal-Learning-with-Alternating-Unimodal-Adaptation

Multimodal Learning Method MLA for CVPR 2024

Language:PythonStargazers:30Issues:0Issues:0

ICLR2024-REDL

[ICLR 2024 Spotlight] R-EDL: Relaxing Nonessential Settings of Evidential Deep Learning

Language:PythonLicense:MITStargazers:29Issues:1Issues:1

JoMoLD

[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

ACES

Audio Captioning Evaluation on Semantics of Sound (ACES)

Language:Jupyter NotebookLicense:MITStargazers:8Issues:1Issues:0

r2bench

[ECCV 2024] R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

Language:PythonStargazers:8Issues:0Issues:0