This script uses several Sound eXchange (SoX) effects in combination to normalise and trim voice recordings that may have been recorded using different microphones, with differing background noise etc.
- Install SoX (download or just use your favorite package manager)
- Locate the installation location for SoX and place the .sh script in that same directory
- Open terminal and add the following entry to your .bash_profile:
alias cleanup="/usr/local/bin/cleanup.sh"
(or replace "cleanup" with whatever you'd like to call it)
With the setup out of the way, you can now use the audio cleanup script with cleanup infile outfile
, where infile
is the audio file you'd like to clean up and outfile
is how/where you'd like the output to audio file to be named and saved.
If you're working with a master file of multiple voice clips, you'll first need to break them apart. Record a list of the breakpoints you'd like to use and then repeatedly run sox infile outfile trim starttime =endtime
on the source file to produce all of the output files you need for the individual audio clips.
You'll probably want to mess with some of the values in the script to tune it to your needs. Here's a brief overview of the SoX effects used. For more/better details, check out the SoX documentation.
remix
- Performs a mix-down of all input channels to mono
highpass
- Follows the form
highpass n
wheren
is a frequency in Hz - Applies a high-pass filter of the specified frequency
- Follows the form
norm
- Alias for
gain -n
- Normalises the audio to 0dB FSD if no argument is provided
- Arguments:
-n
, wheren
is in decibels, will normalize to the specified amount below 0dB FSD
- Alias for
compand
- Follows the form
compand attack1,decay1 soft-knee-dB:in-dB1,out-dB1,...,in-dBn,out-dBn gain initial-volume-dB delay
- Attack and decay are in seconds, with attack typically less than decay
- The soft knee dB value rounds the points where adjacent segments of transfer function meet
- The list of input values must be in increasing order and are specified in dB relative to the maximum possible signal amplitude.
- This is a great write-up that dives into the intricacies of
compand
in more clarity and detail than I will even bother to attempt here.
- Follows the form
vad
- Voice activity detector
- Attempts to trim silence and background sounds from the ends of the audio.
- Can only trim from the front, so the audio must be reversed to trim from the back.
- Options:
- -T: time constant (seconds) used to help ignore short bursts of sound
- -p: amount of audio (Seconds) to preserve before trigger point
- -t: measurement level used to trigger activity detection
fade
- Follows the form
fade n
wheren
is a duration in seconds - Applies a fade effect of the specified duration to the beginning and the end of the audio clip
- Follows the form
reverse
- Flips the audio clip (good for using
vad
to trim silence at the end and to flip back)
- Flips the audio clip (good for using