qiuqiangkong's starred repositories

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:66869Issues:555Issues:706

pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Language:PythonLicense:NOASSERTIONStargazers:8537Issues:150Issues:1544

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8304Issues:130Issues:1054

denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Language:PythonLicense:MITStargazers:7638Issues:32Issues:284

riffusion

Stable diffusion for real-time music generation

Language:PythonLicense:MITStargazers:3311Issues:38Issues:93

habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.

Language:C++License:MITStargazers:2488Issues:82Issues:757

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1288Issues:24Issues:143

minDiffusion

Self-contained, minimalistic implementation of diffusion models with Pytorch.

musika

Fast Infinite Waveform Music Generation

Language:PythonLicense:MITStargazers:663Issues:24Issues:39

ontology

The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.

audio-dataset

Audio Dataset for training CLAP and other models

VAE-CVAE-MNIST

Variational Autoencoder and Conditional Variational Autoencoder on MNIST in PyTorch

AudioMAE

This repo hosts the code and models of "Masked Autoencoders that Listen".

Language:PythonLicense:NOASSERTIONStargazers:510Issues:34Issues:27

sound-spaces

A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.

Language:PythonLicense:CC-BY-4.0Stargazers:328Issues:16Issues:138

gqn-datasets

Datasets used to train Generative Query Networks (GQNs) in the ‘Neural Scene Representation and Rendering’ paper.

Language:PythonLicense:Apache-2.0Stargazers:271Issues:18Issues:16

diffq

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Language:PythonLicense:NOASSERTIONStargazers:230Issues:11Issues:8

EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.

Language:PythonLicense:MITStargazers:205Issues:5Issues:27

Spherical-Array-Processing

A collection of MATLAB routines for acoustical array processing on spherical harmonic signals, commonly captured with a spherical microphone array.

Language:MATLABLicense:BSD-3-ClauseStargazers:162Issues:12Issues:2

SPTS

Official implementation of SPTS: Single-Point Text Spotting (ACM MM 2022 Oral)

AudioLoader

PyTorch Dataset for Speech and Music audio

AudioTaggingDoneRight

experiments about AudioSet

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:43Issues:6Issues:0

Neural-Scene-Representation-and-Rendering

Generative Query Network for rendering 3D scenes from 2D images

Language:PythonLicense:MITStargazers:43Issues:3Issues:0

DCASE2022-data-generator

Data generator for creating synthetic audio mixtures suitable for DCASE Challenge 2022 Task 3

Language:PythonLicense:NOASSERTIONStargazers:28Issues:3Issues:7

DCASE_2022_Task_5

System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection

Language:PythonLicense:NOASSERTIONStargazers:22Issues:0Issues:0

rlr-audio-propagation

Audio propagation engine - Meta Reality Labs Research.

Language:C++License:NOASSERTIONStargazers:17Issues:5Issues:4