indiana-university / automated-transcription-service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Capture key data in DynamoDB

alan-walsh opened this issue · comments

Emily comments

Hi all! As I was running jobs this morning, I had a thought about the reporting stuff we talked about--extracting information from logs. In addition to knowing the file name that was transcribed (to match it to a project/person) and knowing the audio length (to help approximate the cost/value), I wonder if it would be possible to pull out the confidence score? That might give us information on at least how well Amazon thinks it's doing! The other thing is that if this is complicated to do, I'm still at a point with this service where I could manually log audio files and their lengths moving forward, or maybe write a little script I could run on the json files to extract that information without having to go to the logs? (In Amazon as well as GCP) Just a thought.

Step function and DynamoDB

Step function is probably a better way to perform Transcribe-to-Docx anyway, so this could be a step in that process. Write a record for each job into DynamoDB, which would make reporting that much easier.

  • Note: should probably happen after the Docx is created, as we would have all of the necessary data from the transcription. So pass that as a message to the next task in the step function.