indiscipline / freedataset

A set of freely distributed data, mostly audio. Test your compression on me.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Free Dataset

This is a dataset of various freely-distributed files, primarily intended for testing data compression.

Currently includes only audio.

Naming scheme

Audio

type-channels-samplerate-bitdepth-{description}.wav

  • Type: instruments / music / noises / tones / voice
  • Channels: mono / stereo / number of channels
  • Samplerate: float value in kHz + k
  • Bitdepth: number of bits + k + f for floating-point
  • Description: { + name, notes, other info + }

Peg grammar for Nim pegs:

const DatasetNameRules = """
namingscheme <- {type} '-' {channels} '-' {samplerate} '-' {bitdepth} ('-' description)?
type <- 'instruments' / 'music' / 'noises' / 'tones' / 'voice'
channels <- \d+ / 'mono' / 'stereo'
samplerate <- \d+ ('.' \d+)?  'k'
bitdepth <- ('8' / '16' / '24' / '32' / '64') 'b' 'f'?
description <- '{' {@} '}'"""

Content information

audio/

The set includes files in a wide variety of PCM formats holding various kinds of sounds, including artificial test tones. This means, this group of files does not represent a common set of audio you would find in the wild, treating them jointly is not statistically representative.

File Description License
instruments-stereo-48k-24b-{Drum stem}.wav Drum recording, overhead mics with minimal processing. Typical file for audio-engineering workflows. By @indiscipline. CC BY 4.0
music-stereo-44k-16b-{Spitsbergen 3}.wav Progressive electronica. A great track by Viktor Mayorov, this repo's author late friend, coincidentally released under CC Attribution 3.0 Unported. From the album "Мохообразные, лишайники и цианопрокариоты острова Шпицберген". CC BY 3.0
music-stereo-48k-24b-{Long Way to Tipperary}.wav "It's A Long Way To Tipperary" by Judge & Williams, performed by Victor Novelty Band, 1930 (Victor 22487-B). Public Domain
music-stereo-96k-24b-{Drunken Sailor}.wav "Drunken Sailor", trad. arranged by R.R. Terry, performed by John Goss and the Cathedral Male Voice Quartet, 1927 (His Master's Voice (B 2420)). Public Domain
noises-mono-48k-16bf.wav Pink Noise, Binary White Noise, Brown Noise. One minute of each. CC0
noises-stereo-96k-16b-{faux-stereo;dithered}.wav Dithered faux-stereo. One minute of Pink, Binary White and Brown noises. Converted from a higher bitdepth to 16 bit with a conventional dithering noise added. This introduces small differences between otherwise identical interleaved stereo samples, which broadens the distribution of the sample values. For example, in case of Binary noise, you can see 4a0e 4c0e for left and right channels. CC0
noises-stereo-96k-32bf-{faux-stereo}.wav Faux-stereo: both channels hold identical data, so each other sample is redundant. CC0
tones-stereo-48k-32bf-{sweeps}.wav Sweeps in the audible range, up to ~ 8kHz: 20s sine sweep, 10s triangle sweep, 10s square sweep, 20s of two close sines (beating), 20s of counterdirectional sines. CC0
voice-mono-44.1k-16b-{upconvert}.wav Narration of "The Art of War" by Sun Tzu performed by Moira Fogarty for LibriVox.org, 2006-11-02. Upconvert: Originally an mp3, decoded to WAV PCM, which should theoretically decrease the entropy. Public Domain

Disclaimer

⚠️ Legal status of Public Domain media differs around the world, so take legal advice from a professional.

About

A set of freely distributed data, mostly audio. Test your compression on me.