yzyouzhang

You Zhang's starred repositories

generative-models

Generative Models by Stability AI

Language:PythonMIT23133 250 274

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT20117 193 363

pydub

Manipulate audio with a simple and easy high level interface

Language:PythonMIT8535 135 566

neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)

Language:PythonNOASSERTION4250 61 196

docta

A Doctor for your data

Language:PythonNOASSERTION2993 110 3

ijepa

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

Language:PythonNOASSERTION2730 56 55

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonNOASSERTION2138 44 65

awesome-python-scientific-audio

Curated list of python software and packages related to scientific research in audio

1520 79 50

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.01229 28 79

music_source_separation

Language:PythonNOASSERTION1228 25 62

Audio-driven-TalkingFace-HeadPose

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose" (Arxiv 2020) and "Predicting Personalized Head Movement From Short Video and Speech Signal" (TMM 2022)

Language:Python715 25 70

MICA

MICA - Towards Metrical Reconstruction of Human Faces [ECCV2022]

Language:PythonNOASSERTION523 9 60

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonApache-2.0492 13 6

Awesome-Diffusion-Personalization

A collection of resources on personalization with diffusion models.

MIT417 31 2

CLAP

Learning audio concepts from natural language supervision

Language:PythonMIT412 14 15

emotion-classification-from-audio-files

Understanding emotions from audio files using neural networks and multiple datasets.

Language:PythonGPL-3.0399 12 17

Point-Bind_Point-LLM

Align 3D Point Cloud with Multi-modalities for Large Language Models

Language:PythonMIT379 15 12

torchsynth

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

Language:PythonApache-2.0321 12 165

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookMIT211 18 24

OTK

A Pytorch implementation of the optimal transport kernel embedding

Language:Python109 3 3

OGNet

Code for the CVPR 2020 paper 'Old is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm'

Language:PythonMIT85 9 9

Synthetic-Voice-Detection-Vocoder-Artifacts

This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.

Language:PythonMIT71 9 7