yzyouzhang

You Zhang's starred repositories

generative-models

Generative Models by Stability AI

Language:PythonMIT23230 249 277

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT20174 195 364

pydub

Manipulate audio with a simple and easy high level interface

Language:PythonMIT8572 135 566

neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)

Language:PythonNOASSERTION4257 61 196

ijepa

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

Language:PythonNOASSERTION2743 59 56

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonNOASSERTION2147 44 65

awesome-python-scientific-audio

Curated list of python software and packages related to scientific research in audio

1521 79 50

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.01238 29 83

music_source_separation

Language:PythonNOASSERTION1229 25 62

Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

MIT738 43 11

Audio-driven-TalkingFace-HeadPose

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose" (Arxiv 2020) and "Predicting Personalized Head Movement From Short Video and Speech Signal" (TMM 2022)

Language:Python714 25 70

MICA

MICA - Towards Metrical Reconstruction of Human Faces [ECCV2022]

Language:PythonNOASSERTION526 9 60

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonApache-2.0493 13 6

CLAP

Learning audio concepts from natural language supervision

Language:PythonMIT421 14 17

emotion-classification-from-audio-files

Understanding emotions from audio files using neural networks and multiple datasets.

Language:PythonGPL-3.0399 12 17

Point-Bind_Point-LLM

Align 3D Point Cloud with Multi-modalities for Large Language Models

Language:PythonMIT382 15 12

torchsynth

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

Language:PythonApache-2.0321 12 165

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookMIT217 18 24

mandelbrotnn

Torturing neural networks by forcing them to learn the Mandelbrot set.

Language:Python110 5 1

OTK

A Pytorch implementation of the optimal transport kernel embedding

Language:Python109 3 3

OGNet

Code for the CVPR 2020 paper 'Old is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm'

Language:PythonMIT85 9 9

Synthetic-Voice-Detection-Vocoder-Artifacts

This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.

Language:PythonMIT74 9 7