akashmjn / tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regression cases

mrienstra opened this issue · comments

@akashmjn mentioned "hard to reliably reproduce" regressions (occasional deletion of sentences), mentioned briefly in this colab notebook, in response a detailed regression report from @gotjoshua, quoted below in its entirety:

been doing some testing with this... it seems to do a decent job with finding speaker turns... but it also missed big sections of speech that the normal small.en model caught. with -tdrz

{
"timestamps": {
"from": "00:11:03,420",
"to": "00:11:24,340"
},
"text": " it is fascinating. Let's go go to Yuval uh finally Yuval give us uh end on some 
hope here for us alright, I I I need to feel more hopeful than than I do right now 'cause 
I basically think the world's gonna end quite soon when these when Terminator 
comes real. Um do you believe that that the planet has the right people and the
 right place to actually stop that happening?",
"speaker_turn_next": true
},
{
"timestamps": {
"from": "00:11:25,840",
"to": "00:11:26,100"
},
"text": " Mm-hmm.",
"speaker_turn_next": true
},
{
"timestamps": {
"from": "00:11:26,100",
"to": "00:11:46,620"
},
"text": " Mm-hmm.",
"speaker_turn_next": true
},
{
"timestamps": {
"from": "00:11:46,800",
"to": "00:11:51,180"
},
"text": " It's the same with A_I_ and with the technologies of the twenty first century.",
"speaker_turn_next": false
},

same command without -tdrz

{
"timestamps": {
"from": "00:11:03,260",
"to": "00:11:24,340"
},
"text": " I mean it is fascinating. Let's go go to Yuval uh finally. Yuval give us uh end on 
some hope here for us alright, I I I need to feel more hopeful than than I do right now 'cause 
I basically think the world's gonna end quite soon when these when Terminator comes real. 
Um do you believe that that the planet has the right people and the right place to actually stop that happening?"
},
{
"timestamps": {
"from": "00:11:25,600",
"to": "00:11:55,600"
},
"text": " I hope so. What y we know about technology that you know we can use the same 
technology to build completely different societies. In the twentieth century some people used 
uh uh trains and radio and electricity to build totalitarian regimes like the Soviet Union, and 
other people used exactly the same technology to build liberal democracies.
 It's the same with A_I_ and with the technologies of the twenty first century. 
We still have a choice about how to employ them. I think that A_"
},

i can provide source file or you can rip it yourself if you want to test: https://www.youtube.com/watch?v=JV9tzdYT5FU