jnj16180340 / webaudio-notes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

webaudio-notes

WebAudio inspection

  1. Record to disk using OfflineAudioContext

    • Simplest case??? (maybe not)
    • Accomplishes goals if we can also read from the file
  2. Record to disk using separate stream processing server (Node, Elixir etc)

    • More similar to what CloudSpeech sees
    • Easier to send stuff to cloudspeech
  3. MediaRecorder API

Let's go with (2) for the sake of simple disk writes and GoogleSpeech integration

We can only interact with the WebAudio stream from a ScriptProcessorNode, which processes stuff in discrete (time-domain) chunks.

General concerns

  • Can't set sample rate of AudioContext. Any resampling should happen in an offline audio context.
  • BUT, we can set sample rate of MICROPHONE using constraints. One way is to hook up a microphone with desired sample rate to the webaudio graph, set its gain to zero, and roll with it :)
  • We might need to downmix stereo->mono with microphone input. It's probably easiest to do this within a ScriptProcessorNode, or on the writer side. ChannelMerger does not necessarily output mono, be careful
  • Downmixing (should) also happen if we define a ScriptProcessorNode which only has one input channel :)

Audio encoding

SEE http://sox.sourceforge.net/AudioFormats.html

Playing msytery formats:

  • We can play raw (headerless) audio with sox
  • Use play instead of sox -d because it's easy to overwrite files!
  • play -r 44100 -e floating-point -b 32 -c 1 -t raw ./1487623877259
  • If we set 2 channels where there are really 1, it sounds octave-doubled.

Writing WAV headers to raw audio of known format:

  • Instead of writing wav headers by hand/using crappy node modules, let's use sox:
  • sox -r 44100 -e floating-point -b 32 -c 1 -t raw 1487626939167.raw 1487626939167.wav
  • You can also specify other params such as endian order...
  • play --channels=1 --bits=16 --rate=16000 --encoding=signed-integer --endian=little audio.raw

Transforming audio with sox:

  • From WAV specify new format after input filename. Replace -d with output file name.

  • Resample: sox bstheme-44k-f32.wav -r 8000 -e floating-point -b 32 -c 1 -d

  • Reformat: sox bstheme-44k-f32.wav -r 44100 -e unsigned-integer -b 8 -c 1 -d

  • 24-bit FLAC is supported by CloudSpeech streaming. Specifying un/signed/integer/float in the encoder causes trouble.

  • sox bstheme-44k-f32.wav -r 44100 -b 24 -c 1 -C 8 bstheme-44k-i24-c8.flac

  • LPCM16 WAV files are just raw LPCM16 audio data prepended with a header. Add/strip this header as necessary.

    • Find out bitrate, sampling rate etc. of stream
    • Write WAV header
    • Write data...
  • FLAC stream compression supports LPCM16 (FLAC does not support floats)

ScriptProcessorNode

Google CloudSpeech

Audio input can be captured by an application’s microphone or sent from a pre-recorded audio file. Multiple audio encodings are "supported," including FLAC, AMR, PCMU and Linear-16. See Google dox

Encoding Support Notes
ENCODING_UNSPECIFIED Not specified. Will return result google.rpc.Code.INVALID_ARGUMENT.
LINEAR16 Uncompressed 16-bit signed little-endian samples (Linear PCM). This is the only encoding that may be used by AsyncRecognize. Can it be compressed by e.g. gzip?
FLAC This is the recommended encoding for SyncRecognize and StreamingRecognize because it uses lossless compression. 16-bit and 24-bit samples are supported. Not all fields in STREAMINFO are supported. Clearly AsyncRecognize is not StreamingRecognize!
MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AMR Adaptive Multi-Rate Narrowband codec. sample_rate must be 8000 Hz.
AMR_WB Adaptive Multi-Rate Wideband codec. sample_rate must be 16000 Hz.

SEE Encoding
SEE dox
AsyncRecognize: "Long running operation", probably most useful for offline transcription of long audio pieces (limited to 80', sync/stream limited to 1'). Pass b64 encoded raw audio data OR file stored in GoogleCloudStorage
SyncRecognize: Functionally similar to Async, but supports more encodings + less audio time
StreamingRecognize: Stream audio + receive streamed transcription. This is the one we want.

FFR

Websockets/node stream adapter

Node wrappers around sox for easy transcoding

About

License:The Unlicense


Languages

Language:JavaScript 96.1%Language:HTML 3.9%