imbibekk

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT4400 58 149

Awesome-GPTs

Curated list of awesome GPTs 👍.

GPL-3.03018 25 144

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonMIT2335 31 112

usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Language:C++Apache-2.02065 27 138

AudioSep

Official implementation of "Separate Anything You Describe"

Language:PythonMIT1547 64 21

segment-anything-fast

A batched offline inference oriented version of segment-anything

Language:PythonApache-2.01171 10 42

MyHeyGen

Language:Python108700

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT1084 26 72

LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Language:PythonApache-2.01075 11 55

HeyGenClone

A simple and open-source analogue of the HeyGen system

Language:Python861 21 24

parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Language:PythonApache-2.0544 13 139

ChatGPT-in-Slack

Swift demonstration of how to build a Slack app that enables end-users to interact with a ChatGPT bot

Language:PythonMIT436 15 44

mustango

Mustango: Toward Controllable Text-to-Music Generation

Language:PythonMIT317 16 12

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonMIT292 10 21

ai-audio-datasets-list

This is a list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications. It is mainly used for speech recognition, speech synthesis, singing voice synthesis, music information retrieval, music generation, etc.

MIT182 8 1

AQUA-Tk

AQUA-Tk = Audio QUality Assessment-Toolkit. (In development)

Language:PythonGPL-3.093 3 3

sensorium

NeurIPS | 1st place solution for Sensorium 2023 Competition

Language:PythonMIT22 2 1