pietrop/five-min-stt

5 min STT

A module to make STT (Speech to Text) modules, to work on a five minutes turnaround time. Refactored from autoEdit2 to use in autoEdit3.

Setup

git clone git@github.com:pietrop/five-min-stt.git

cd five-min-stt

npm install

Usage

Usage in development

see the example usage in src/

Usage in production

npm install five-min-stt

const fiveMinStt = require('five-min-stt');
const url = 'https://download.ted.com/talks/KateDarling_2018S-950k.mp4';
const audioFileOutput = './KateDarling_2018S-950k.wav';

const sttTranscribeFunction = async (filePath) => {
  return await assemblyai({ ApiKey, filePath });
};

fiveMinStt({ file: url, audioFileOutput, ffmpegBinPath, ffprobeBinPath, sttTranscribeFunction }).then((resp) => {
  console.log('example usage, fiveMinStt::', JSON.stringify(resp, null, 2));
});

optionally you can specify audioFileOutput

const audioFileOutput = './KateDarling_2018S-950k.wav';

fiveMinStt({ file: url, audioFileOutput, ffmpegBinPath, ffprobeBinPath, sttTranscribeFunction }).then((resp) => {
  console.log('example usage, fiveMinStt::', JSON.stringify(resp, null, 2));
});

Note that audioFileOutput - is optional,

if not provided it creates one in a tmp dir on the system, and the deletes it when done.
if provided name/path for audio version destination then is developer's responsability to decide if they want to keep or delete the audio file.

Note that if you are using with AssemblyAi STT, on free tier account, there's a limit of one concurrent transcript at a time. After which they get throttled. For pay as you go accounts a limit of 32. If exceed those it will also get throttled. But for 1 hour: 60 min / 5 = 12 concurrent transcription. See table below for more examples.

hour	min	chunks	concurrent segments
1	60	5	12
2	120	5	24
3	180	5	36

The 3 hour lenght would go over the 32 concurrent transcriptions, and the exceeding one would be throttled.

System Architecture

Convert to audio file
Split audio file into 5 minutes segments, if over 5 minutes.
send segments to STT service
re-adjust results by adding offsets to word timings, and combine into one list
delete tmp audio segments
return resulting transcript

Initially developed to work with @pietrop/assemblyai-node-sdk but tries not to be opinionated about which STT service you use. Altho it assumes the result from the sttTranscriFunction has a words attribute with word object, with end, start timecodes and text attribute.

{
    "words": [
        {
            "end": 440,
            "start": 0,
            "text": "You",
            ...
        },
        ...
    ]

}

Note that the scirpt does not modify the unit of the timings for start and end, eg if they are in seconds or milliseconds that stays as it is.

Development env

npm > 6.1.0
Node 12

Node version is set in node version manager .nvmrc

nvm use

Linting

This repo uses prettier for linting. If you are using visual code you can add the Prettier - Code formatter extension, and configure visual code to do things like format on save.

You can also run the linting via npm scripts

npm run lint

and there's also a pre-commit hook that runs it too.

Build

Tests

Deployment

to publish to npm

npm run publish:public

To do a dry run

npm run publish:dry:run

pietrop / five-min-stt