Mark12Ding

Mark Ding's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0791000

PointLLM

[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds

Language:Python46100

EDGE

Official PyTorch Implementation of EDGE (CVPR 2023)

Language:PythonMIT42200

Moore-AnimateAnyone

Character Animation (AnimateAnyone, Face Reenactment)

Language:PythonApache-2.0300200

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonBSD-3-Clause303700

mt-bench-101

[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

Apache-2.02900

POPDG

[CVPR 2024] POPDG: Popular 3D Dance Generation with PopDanceSet

Language:PythonMIT2600

MotionLCM

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"

Language:PythonNOASSERTION19400

Make-An-Audio-3

Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers

Language:Python5800

Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

Language:PythonMIT72700

awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

MIT26600

Melodist

Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment

100

ICLR2024-FTIC

[ICLR2024] FTIC: Frequency-aware Transformer for Learned Image Compression

Language:Python2800

ECCV2024-AdpatICMH

[ECCV2024] Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation

1600

GTA-Seg

Code for GTA-Seg (NeurIPS2022)

Language:PythonApache-2.03700

d3fields

[arXiv] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

Language:PythonMIT10300

nxtp

Object Recognition as Next Token Prediction (CVPR 2024)

Language:PythonNOASSERTION14600

lm-watermarking

Language:Jupyter NotebookApache-2.048300

VidMuse

2200

MambaOut

MambaOut: Do We Really Need Mamba for Vision?

Language:PythonApache-2.0190900

FedScale

FedScale is a scalable and extensible open-source federated learning (FL) platform.

Language:PythonApache-2.038300

SOFT

[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity

Language:PythonMIT30000

madmom

Python audio and music signal processing library

Language:PythonNOASSERTION128700

harmonixset

The Harmonix Set: Beats, Downbeats, and Structural Annotations for Pop Music

Language:Jupyter NotebookMIT14300

all-in-one

All-In-One Music Structure Analyzer

Language:PythonMIT38600

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonApache-2.050900

hierarchical-structure-analysis

Algorithm and Data for paper "Automatic Detection of Hierarchical Structure and Influence of Structure on Melody, Harmony and Rhythm in Popular Music"

Language:PythonMIT8600

GeoWizard

[ECCV'24] GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Language:Python65800

ACE_phonemes

a guide to grapheme-to-phoneme conversion and phoneme list for ace singing voice synthesis engine

Language:PythonMIT3000

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.0127600