lyramakesmusic / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Home Page:https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A fork of descript-audio-codec that trains on 44.1khz stereo

Installation

pip install git+https://github.com/lyramakesmusic/descript-audio-codec

Programmatic Usage

import dac
from audiotools import AudioSignal

# Download a model
model_path = dac.utils.download(model_type="44khz")
model = dac.DAC.load(model_path)

model.to('cuda')

# Load audio signal file
signal = AudioSignal('input.wav')

# Encode audio signal as one long file
# (may run out of GPU memory on long files)
signal.to(model.device)

x = model.preprocess(signal.audio_data, signal.sample_rate)
z, codes, latents, _, _ = model.encode(x)

# Decode audio signal
y = model.decode(z)

# Alternatively, use the `compress` and `decompress` functions
# to compress long files.

signal = signal.cpu()
x = model.compress(signal)

# Save and load to and from disk
x.save("compressed.dac")
x = dac.DACFile.load("compressed.dac")

# Decompress it back to an AudioSignal
y = model.decompress(x)

# Write to file
y.write('output.wav')

Pre-requisites

Please install the correct dependencies

pip install -e ".[dev]"

Single GPU training

export CUDA_VISIBLE_DEVICES=0
python scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/

Multi GPU training

export CUDA_VISIBLE_DEVICES=0,1
torchrun --nproc_per_node gpu scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/

About

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5

License:MIT License


Languages

Language:Python 99.8%Language:Dockerfile 0.2%