You Zhang (yzyouzhang)

yzyouzhang

Geek Repo

Company:University of Rochester

Location:NY, US

Home Page:https://yzyouzhang.com

Twitter:@yzyouzhang

Github PK Tool:Github PK Tool


Organizations
AirLabUR

You Zhang's starred repositories

generative-models

Generative Models by Stability AI

Language:PythonLicense:MITStargazers:23133Issues:250Issues:274

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20117Issues:193Issues:363

pydub

Manipulate audio with a simple and easy high level interface

Language:PythonLicense:MITStargazers:8535Issues:135Issues:566

neuralangelo

Official implementation of "Neuralangelo: High-Fidelity Neural Surface Reconstruction" (CVPR 2023)

Language:PythonLicense:NOASSERTIONStargazers:4250Issues:61Issues:196

docta

A Doctor for your data

Language:PythonLicense:NOASSERTIONStargazers:2993Issues:110Issues:3

ijepa

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

Language:PythonLicense:NOASSERTIONStargazers:2730Issues:56Issues:55

AudioLDM2

Text-to-Audio/Music Generation

Language:PythonLicense:NOASSERTIONStargazers:2138Issues:44Issues:65

awesome-python-scientific-audio

Curated list of python software and packages related to scientific research in audio

CLAP

Contrastive Language-Audio Pretraining

Language:PythonLicense:CC0-1.0Stargazers:1229Issues:28Issues:79

Audio-driven-TalkingFace-HeadPose

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose" (Arxiv 2020) and "Predicting Personalized Head Movement From Short Video and Speech Signal" (TMM 2022)

MICA

MICA - Towards Metrical Reconstruction of Human Faces [ECCV2022]

Language:PythonLicense:NOASSERTIONStargazers:523Issues:9Issues:60

LLaSM

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Language:PythonLicense:Apache-2.0Stargazers:492Issues:13Issues:6

Awesome-Diffusion-Personalization

A collection of resources on personalization with diffusion models.

CLAP

Learning audio concepts from natural language supervision

Language:PythonLicense:MITStargazers:412Issues:14Issues:15

emotion-classification-from-audio-files

Understanding emotions from audio files using neural networks and multiple datasets.

Language:PythonLicense:GPL-3.0Stargazers:399Issues:12Issues:17

Point-Bind_Point-LLM

Align 3D Point Cloud with Multi-modalities for Large Language Models

Language:PythonLicense:MITStargazers:379Issues:15Issues:12

torchsynth

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

Language:PythonLicense:Apache-2.0Stargazers:321Issues:12Issues:165

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookLicense:MITStargazers:211Issues:18Issues:24

OTK

A Pytorch implementation of the optimal transport kernel embedding

OGNet

Code for the CVPR 2020 paper 'Old is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm'

Language:PythonLicense:MITStargazers:85Issues:9Issues:9

Synthetic-Voice-Detection-Vocoder-Artifacts

This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.

Language:PythonLicense:MITStargazers:71Issues:9Issues:7

multimodal-decoding

Code associated with the paper titled "A high-performance neuroprosthesis for speech decoding and avatar control" , published in Nature in 2023.

Language:Jupyter NotebookStargazers:53Issues:2Issues:1

ScalableFHVAE

This repository contains the code to reproduce the core results from the paper "Scalable Factorized Hierarchical Variational Autoencoders"

Audio_Research_in_US

For students who would like to apply for RA, PhD, postdoc in audio research.

T-EER

Official PyTorch implementation of "t-EER: Parameter-Free Tandem Evaluation Metric of Countermeasures and Biometric Comparators"

Language:PythonLicense:MITStargazers:11Issues:2Issues:0

Breaking-Security-Critical-Voice-Authentication

Source code for paper "Breaking Security-Critical Voice Authentication".

Language:PythonStargazers:9Issues:0Issues:0

ntools_elec

Intracranial Electrode Localization

Language:MATLABLicense:GPL-3.0Stargazers:6Issues:1Issues:3

PhaseAntispoofing_INTERSPEECH

Official repository of the paper "Phase perturbation improves channel robustness for speech spoofing countermeasures"

Language:PythonLicense:MITStargazers:6Issues:1Issues:0
Language:PythonStargazers:2Issues:0Issues:0