Picovoice / speech-to-text-benchmark

speech to text benchmark framework

Home Page:https://picovoice.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WER?

xiongyihui opened this issue · comments

The PicovoiceCheetahASREngine is super fast, but is not accurate based on my test.

Is it suitable to use Levenshtein distance as the WER?

From Wikipedia

The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level.

Here is some transcripts of Common Voice's cv-valid-dev set.

from "go into your dance" to "go into gour dance"
from "when you start to eat like this something is the matter" to "when you start to eat like this something is the matter"
from "i had seen all that it would presently bring me" to "i have feen all that it would presently bring me"
from "he moved about invisible but everyone could hear him" to "he moved abut and vasible but everyone could hear him"
from "mr lee can't be bothered now" to "mister lincal be bothered now"
from "the shepherd swore that he would" to "the shepherd swore that he would"
from "it must have fallen while i was sitting over there" to "i must have fallnwhile i was sitting over there"
from "just like an organ" to "just like in organ"
from "and the solid part was called the philosopher's stone" to "and the solid park was caurds af alows of her sto"
from "raisins are delicious" to "raisoned arylitious"
from "get the governor on the phone" to "got the governor on the fo"
from "so then try he said to the englishman" to "so then try he said to the englishman"
from "i thought about whether we should find coins and models in it and so on" to "i thought about whether of we should find coins and lodtels in it and to on"
from "the angel touched the man's shoulder and they were both projected far into the future" to "the ageel touched lhe ment showter and thet eiere both projectted far into the futur"
from "lots of places sell tea around here the merchant said" to "the lass of pases selt he aroud here thand much in se"
from "everyone on earth has a treasure that awaits him his heart said" to "onper one on perds has a triture that the waiys hom hes hart said"
from "i'm beginning to like this" to "i am beginning to like this"
from "all they wanted was food and water" to "all they wanted witsh food and water"
from "it has happened many times before" to "it does not many danse before"
from "but most important he was able every day to live out his dream" to "but most important he was able every day to live pout his dream"
from "because you have already lost your savings twice" to "because you have already lost hor savings twice"
from "whenever he could he sought out a new road to travel" to "whenever he colld he sai tout and yow rod to trallo"
from "he was about the same age and height as the boy" to "he was about the same age and height as the"
from "drawing from my own experience as a learner of english and german i value engaging activities that involve everyday conversation" to "rong from mor own experience as the lone of english and german i tout jenin gatehing at tomatis that involve every day contersation"
from "they set off running wildly into the trees" to "they set off fonning wild leans hrough the trees"
from "but finally the merchant appeared and asked the boy to shear four sheep" to "the finally the merchant oppeared in asked the boy to sheer for sheep"
from "the merchant looked anxiously at the boy" to "the merchon non anxiously af the bo"
from "the nurse waddled around the ward" to "the nervese watld around the ward"
from "where did he keep his money" to "where do he caep his momey"
from "i learned the alchemist's secrets in my travels" to "i learned the alchemistsecrets in my trave"
from "everything in life is an omen said the englishman now closing the journal he was reading" to "averything in life is an omen said the englishman now closing the journal he was realing"
from "the boy was startled" to "the boy wast storted"
from "half an hour later his shovel hit something solid" to "al aleyre lihter he shavell hit subething sollid"
from "i heard a faint movement under my feet" to "thhard faint movement under my feet"
from "it is i the boy answered" to "it is i the boy answered"
from "its been a long time since she last read chekhov and because of that she no longer feels like the heroine of her own story" to "tin along time since she last read checkel and because of that she no longer fiel like the herror one of her ens story"
from "i learned how to care for sheep and i haven't forgotten how that's done" to "i learn how to kare for sheep and i haven't forgotten 'll dats then"
from "they placed the symbols of the pilgrimage on the doors of their houses" to "e place the sembols of the tigrimice on the doors of their houses"
from "i heard a faint movement under my feet" to "i heard a faint movement on the my fet"
from "he could always go back to being a shepherd" to "he could alwas go back to being a shepherd"
from "its lower end was still embedded" to "it's lower and wesstill imbidded"
from "hundreds must have seen it and taken it for a falling star" to "hundreds must have seen it and taken in for a falling stan"
from "as the sun rose the men began to beat the boy" to "as the sun rosed man began to peat the boy"
from "i feel i ought to take care of her" to "i feel i aught to take car of her"
from "i have to find a man who knows that universal language" to "i have to find a mann  who knows that univrsal lang"
from "the burning fire had been extinguished" to "the burning far had been extinguished"
from "how come you speak spanish he asked" to "i'l gons biaks that anish he sk"
from "i thought tonight i'd put miss kelly there" to "i thought tonig to id but miss carly there"
from "the cursor blinked expectantly" to "the curser bliked expectantly"
from "the boy mumbled an answer that allowed him to avoid responding to her question" to "the boy mumbled and onser that a loued him to avoide was ponding to her question"
from "fresh coffee is much better than the freeze dried stuff" to "fresh coffee is much better then the freezes drives stuff"
from "but you will love her and she'll return your love" to "but you were love her and she'll return your love"
from "i think they're going to last for a long time he said to the monk" to "i think they're going to last for a long time he said to the mon"
from "and then he perceived it very slowly" to "and hany perceived it very slowly"
from "my wife pointed out to me the brightness of the red green and yellow signal lights" to "my wife pointed out to me the brightness of the red green and yellow signal lights"
from "they become the soul o f the world" to "then come to saw of the warl"
from "what's going on here" to "what's going on here"
from "the alchemist knocked on the gate of the monastery" to "is the alchemist nocked on the gait of the monastery"
from "her manipulation failed" to "ere minipulation fai"
from "why the shots stopped after the tenth no one on earth has tried to explain" to "why te shot stopped after the tenth no one on earth has tried ting thran"
from "that's the man who knows all the secrets of the world she said" to "that's the man hur knows all the secrets of the word she is said"
from "they reached the center of a large plaza where the market was held" to "there reach the sentr af a large plase awere the market was helt"
from "he was thinking about omens and someone had appeared" to "he was thinking about omens and someone had appeared"
from "no sense messing up the streets" to "no fhende metting up the street"
from "that's what i'm not supposed to say" to "but if what i've not tre posed to fary"
2018-08-08 15:53:18,429 WER = 0.336652

Hello @xiongyihui

I think there are two questions here. (1) Use of Levenstein distance (2) Accuracy. Let's attend to (1) and once resolved we can go to (2).

The code currently is using Levenstein distance to compute the WER according to the Wiki link you provided. It is here: https://github.com/Picovoice/stt-benchmark/blob/master/benchmark.py#L15-L29
Does this answer question (1)?

Word Error Rate (WER) is working at the word level. But the Levenstein distance in the code is at the char level.

I believe it works in word level. If you run the function with arguments below:

_word_error_rate("this is ref transcript", "that is transcript")

it gives you 0.5 which is the correct WER. There are 4 words in reference. There are 1 substitution (this -> that) and 1 deletion (ref).

If I am missing something please elaborate and I'll be happy to apply fixes.

Oh, sorry, I misunderstood the editdistance.eval()