Tingle Li's starred repositories

AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Language:PythonLicense:MITStargazers:167253Issues:1553Issues:2692

awesome-chatgpt-prompts

This repo includes ChatGPT prompt curation to use ChatGPT better.

Language:HTMLLicense:CC0-1.0Stargazers:111351Issues:1440Issues:0

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonLicense:NOASSERTIONStargazers:8255Issues:99Issues:89

AugLy

A data augmentations library for audio, image, text, and video.

Language:PythonLicense:NOASSERTIONStargazers:4950Issues:78Issues:74

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonLicense:MITStargazers:3449Issues:57Issues:70

stable-audio-tools

Generative models for conditional audio generation

Language:PythonLicense:MITStargazers:2565Issues:43Issues:92

audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

Language:PythonLicense:MITStargazers:1934Issues:39Issues:43

audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Language:PythonLicense:MITStargazers:1826Issues:20Issues:181

CLAP

Contrastive Language-Audio Pretraining

Language:PythonLicense:CC0-1.0Stargazers:1361Issues:28Issues:88

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:1146Issues:27Issues:75

clean-fid

PyTorch - FID calculation with proper image resizing and quantization steps [CVPR 2022]

Language:PythonLicense:MITStargazers:944Issues:9Issues:49

lhotse

Tools for handling speech data in machine learning projects.

Language:PythonLicense:Apache-2.0Stargazers:936Issues:44Issues:410

av_hubert

A self-supervised learning framework for audio-visual speech

Language:PythonLicense:NOASSERTIONStargazers:835Issues:15Issues:111

audio-dataset

Audio Dataset for training CLAP and other models

textlesslib

Library for Textless Spoken Language Processing

Language:PythonLicense:MITStargazers:528Issues:16Issues:24

MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

Language:HTMLLicense:MITStargazers:478Issues:16Issues:34

CPC_audio

An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

Language:PythonLicense:MITStargazers:347Issues:15Issues:12

lyrebird-wav2clip

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Language:PythonLicense:MITStargazers:324Issues:11Issues:13

frechet-audio-distance

A lightweight library for Frechet Audio Distance calculation.

Language:PythonLicense:MITStargazers:231Issues:2Issues:13

awesome-audiovisual-learning

A curated list of audio-visual learning methods and datasets.

VisualVoice

Audio-Visual Speech Separation with Cross-Modal Consistency

Language:PythonLicense:NOASSERTIONStargazers:219Issues:9Issues:31

audio-data-pytorch

A collection of useful audio datasets and transforms for PyTorch.

Language:PythonLicense:MITStargazers:130Issues:5Issues:5

selavi

This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters from multi-modal data in a self-supervised way.

Language:PythonLicense:NOASSERTIONStargazers:114Issues:12Issues:4

MixGCF

MixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems, KDD2021

audioscrape

Scrape audio from YouTube and SoundCloud with a simple command-line interface.

Language:PythonLicense:AGPL-3.0Stargazers:83Issues:3Issues:6

cocktail-fork-separation

Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset

Language:PythonLicense:MITStargazers:74Issues:4Issues:2

AudioLoader

PyTorch Dataset for Speech and Music audio

avstyle

Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)

Language:PythonLicense:MITStargazers:14Issues:2Issues:4