jianchang512 / stt

Voice Recognition to Text Tool / 一个离线运行的本地语音识别转文字服务,输出json、srt字幕带时间戳、纯文字格式

Home Page:https://pyvideotrans.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Big File can recongnize but no result.

WisdomLove opened this issue · comments

0401a3a60003bc0060ce23129efef5f
one mp4 file with 100MB size, about 1 hrs, cuda with float32 , the result display nothing after recognize.
the sample mp4 file can recongnize.

with model large v3

what output at cmd ?

set cuda_cmd_type =int8 retry

CMD OUTPUT -- NOTHING EXCEPT 1983 I MENTIONED LAST ISSUE.

I WILL TRY int8 LATER. THANKS A LOT.

the problem was happened again, I will change to cpu.

cuda_cmd_type = int8_float16

if source code deploy, line 107

segments,info = modelobj.transcribe(wav_file,  beam_size=1,best_of=1,temperature=0,  vad_filter=True,  vad_parameters=dict(min_silence_duration_ms=500),language=language)

change to

        segments,info = modelobj.transcribe(wav_file,  beam_size=5,best_of=5, vad_filter=True,  vad_parameters=dict(min_silence_duration_ms=500),language=language)

cpu test passed with normal result srt files forabout 5500s, will try cuda_cmd_type = int8_float16 later.

one mp4 file with 100MB size, about 68 mins, cuda with int8_float16 , the result display nothing after recognize.
another mp4 file with 100MB size, about 62 mins, cuda with int8_float16, the result displayed.

cannot show

update to 0.91 and open set.ini and try adjusting several parameters at the bottom, all with comments. You can try to adjust them according to the maximum and minimum GPU consumption

wow its hard to ajust.... I dont know how to ajust, comments not so clear.

web_address=127.0.0.1:9977
lang=en
devtype=cuda
cuda_com_type=float32
beam_size=5
best_of=5
vad=true
temperature=1
condition_on_previous_text=true

This is the best effect, but it also consumes the most GPU


web_address=127.0.0.1:9977
lang=en
devtype=cuda
cuda_com_type=int8
beam_size=1
best_of=1
vad=false
temperature=0
condition_on_previous_text=false

This is the most GPU efficient configuration and the effect is relatively poor

The speech recognition to subtitle function in this project is the same as the current project's speech recognition function, both are fast wheelers. Perhaps you can download this and give it a try.

https://github.com/jianchang512/pyvideotrans

web_address=127.0.0.1:9977
lang=en
devtype=cuda
cuda_com_type=float32
beam_size=5
best_of=5
vad=true
temperature=1
condition_on_previous_text=true
SNAG-0278

consume all my GPU that is nice
but this performance velocity or what si difference at final?

{'web_address': '127.0.0.1:9977', 'lang': 'en', 'devtype': 'cuda', 'cuda_com_type': 'float32', 'beam_size': 5, 'best_of': 5, 'vad': True, 'temperature': 1, 'condition_on_previous_text': True}

The browser is open. If it does not open automatically, please open the URL manually http://127.0.0.1:9977
res.status_code=200
d={'version': 'v0.0.91', 'version_num': 91}
2024-01-29 21:28:53.8140166 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1983 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.
CUDA failed with error out of memory
what option i have for prevent our of memory?

pyvideotrans perfcet tools

{'web_address': '127.0.0.1:9977', 'lang': 'en', 'devtype': 'cuda', 'cuda_com_type': 'float32', 'beam_size': 5, 'best_of': 5, 'vad': True, 'temperature': 1, 'condition_on_previous_text': True}

The browser is open. If it does not open automatically, please open the URL manually http://127.0.0.1:9977 res.status_code=200 d={'version': 'v0.0.91', 'version_num': 91} 2024-01-29 21:28:53.8140166 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1983 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed. CUDA failed with error out of memory what option i have for prevent our of memory?

web_address=127.0.0.1:9977
lang=en
devtype=cuda
cuda_com_type=int8
beam_size=1
best_of=1
vad=false
temperature=0
condition_on_previous_text=false

This is the most GPU efficient configuration and the effect is relatively poor