How did you measure computationally unaware latency?

Question

How did you measure computationally unaware latency?

lifefeel opened this issue 8 months ago · comments

J.P Lee commented 8 months ago

This question is not about code issue.

I read your paper and would like to know more specifically about computationally unaware latency.

In your code, computationally unaware mode does not measure latency. How did you measure latency?

I want to reproduce your result.

Thanks.

Dominik Macháček commented 8 months ago

ok

Dominik Macháček · Answer 1 · Thu Nov 23 2023 14:33:49 GMT+0800 (China Standard Time)

I have another code for it. It's described in the paper -- WER align words in hypothesis and gold transcript, and then make diff of their timestamps.
Do you need the code? I think I can publish it.

J.P Lee · Answer 2 · Thu Nov 23 2023 14:37:37 GMT+0800 (China Standard Time)

Yes. It's very helpful to reproduce the same result.

Dominik Macháček · Answer 3 · Sun Nov 26 2023 22:18:28 GMT+0800 (China Standard Time)

So, here it is:

lat.zip

Simple usage:

python3 latency.py $doc/orto+ts $asr $lan > $asr.lat

$doc/orto+ts is the gold transcript, format like this:

3347 3497 Herr
3497 4227 Präsident,
4227 4957 Herr
4957 5337 Kommissar,
5337 5627 verehrte
5627 6117 Kollegen,
6117 6237 die
6237 7177 Diskussion,

$asr is the real-time ASR hypothesis file, output of Whisper-Streaming. Format like this:

9594.1408 0 5540 Herr Präsident, Herr Kommissar, verehrte
11769.4223 5560 7940 Kollegen, die Diskussion, die wir
13601.9289 7940 8480 jetzt führen,
16274.7409 8960 12180 zieht darauf ab, Tierversuche zu verringern und
18144.7799 12180 14200 die Bedingungen zu verbessern.

The first number on the line is the emission time of the line in milliseconds. The other two numbers are skipped.

$lan is not a file, but a variable containing the lang. code for MosesTokenizer, e.g. "en" or "de"
$asr.lat is output file that is created, there's e.g. 5810.91 -- average word latency in milliseconds

I hope it helps. Sorry for so messy scripts, they're not cleaned for publication.

Dominik Macháček · Answer 4 · Sun Nov 26 2023 23:13:00 GMT+0800 (China Standard Time)

please reopen if you have question or issue

How did you measure computationally unaware latency?

How did you measure computationally unaware latency?