A module to make STT (Speech to Text) modules, to work on a five minutes turnaround time. Refactored from autoEdit2 to use in autoEdit3.
git clone git@github.com:pietrop/five-min-stt.git
cd five-min-stt
npm install
see the example usage in src/
npm install five-min-stt
const fiveMinStt = require('five-min-stt');
const url = 'https://download.ted.com/talks/KateDarling_2018S-950k.mp4';
const audioFileOutput = './KateDarling_2018S-950k.wav';
const sttTranscribeFunction = async (filePath) => {
return await assemblyai({ ApiKey, filePath });
};
fiveMinStt({ file: url, audioFileOutput, ffmpegBinPath, ffprobeBinPath, sttTranscribeFunction }).then((resp) => {
console.log('example usage, fiveMinStt::', JSON.stringify(resp, null, 2));
});
optionally you can specify audioFileOutput
const audioFileOutput = './KateDarling_2018S-950k.wav';
fiveMinStt({ file: url, audioFileOutput, ffmpegBinPath, ffprobeBinPath, sttTranscribeFunction }).then((resp) => {
console.log('example usage, fiveMinStt::', JSON.stringify(resp, null, 2));
});
Note that audioFileOutput
- is optional,
- if not provided it creates one in a tmp dir on the system, and the deletes it when done.
- if provided name/path for audio version destination then is developer's responsability to decide if they want to keep or delete the audio file.
Note that if you are using with AssemblyAi STT, on free tier account, there's a limit of one concurrent transcript at a time. After which they get throttled. For pay as you go accounts a limit of 32. If exceed those it will also get throttled. But for 1 hour: 60 min / 5 = 12 concurrent transcription. See table below for more examples.
hour | min | chunks | concurrent segments |
---|---|---|---|
1 | 60 | 5 | 12 |
2 | 120 | 5 | 24 |
3 | 180 | 5 | 36 |
The 3 hour lenght would go over the 32 concurrent transcriptions, and the exceeding one would be throttled.
- Convert to audio file
- Split audio file into 5 minutes segments, if over 5 minutes.
- send segments to STT service
- re-adjust results by adding offsets to word timings, and combine into one list
- delete tmp audio segments
- return resulting transcript
Initially developed to work with @pietrop/assemblyai-node-sdk
but tries not to be opinionated about which STT service you use. Altho it assumes the result from the sttTranscriFunction
has a words
attribute with word object, with end, start timecodes and text attribute.
{
"words": [
{
"end": 440,
"start": 0,
"text": "You",
...
},
...
]
}
Note that the scirpt does not modify the unit of the timings for start
and end
, eg if they are in seconds or milliseconds that stays as it is.
- npm >
6.1.0
- Node 12
Node version is set in node version manager .nvmrc
nvm use
This repo uses prettier for linting. If you are using visual code you can add the Prettier - Code formatter extension, and configure visual code to do things like format on save.
You can also run the linting via npm scripts
npm run lint
and there's also a pre-commit hook that runs it too.
NA
NA
to publish to npm
npm run publish:public
To do a dry run
npm run publish:dry:run