Vancause

followers

following

stars

Peking University

Zhongjie Ye's repositories

KAMA_AC

Language:Python100

ae-w2v-attention

Language:PythonMIT000

audio-classifier

Classify sounds using YouTube-8M and VGGish models

Language:Python000

audioset_tagging_cnn

Language:PythonMIT010

CDur

Repository for the paper "Towards duration robust weakly supervised sound event detection"

Language:PythonGPL-3.0010

clip-event

Language:Python000

zjy.github.io

Language:HTML000

coala

COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

MIT000

crank

A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder

MIT000

DCASE2021-Task1b

Audio-Visual Classifier in Acoustic Scene Clasification

Language:PythonMIT010

DCASE2021_task6_v2

Code for CVSSP submission to DCASE 2021 Task 6

000

dcase_2020_T6

2nd place solution for 2020 DCASE challenge task 6 audio captioning. http://dcase.community/challenge2020/task-automatic-audio-captioning-results#wuyusong2020_t6

000

deepsvg

[NeurIPS 2020] Official code for the paper "DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation". Includes a PyTorch library for deep learning with SVG data.

MIT000

DeepXi

Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.

MPL-2.0000

dual_encoding

[CVPR2019] Dual Encoding for Zero-Example Video Retrieval

Apache-2.0000

FeatureCut_Y

000

FullSubNet

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

MIT000

HAKE-Action-Torch

HAKE-Action in PyTorch

Apache-2.0000

Meta-DETR

Meta-DETR: Official PyTorch Implementation

MIT000

PAGAN

PAGAN: a phase-adapted GAN for speech enhancement

Language:PythonMIT010

ppg-vc

PPG-Based Voice Conversion

Apache-2.0000

PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Language:PythonApache-2.0010

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

NOASSERTION000

retrieval-augmentation-nn

Generalization of deep neural networks by using the information of nearest training examples

000

SCAN

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

Language:PythonApache-2.0010

SD-FSIC

MIT000

SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck

MIT000

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

MIT000

vc_Real-Time-Voice-Cloning

clone Real-Time-Voice-Cloning to test

020

vcc20_baseline_cyclevae

Voice Conversion Challenge 2020 CycleVAE baseline system

MIT000