mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chinese voice error?

opentld opened this issue · comments

platform: windows10, vs2019, cuda10.2
testing english model correctly, but testing chinese model, get wrong output:
the input chinese voice is : 测试,测试,but the output is: 鍘邋邋

speechTest --model deepspeech-0.9.3-models-zh-CN.pbmm --audio 3.wav
2022-05-22 12:11:55.426120: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-05-22 12:11:55.436894: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-22 12:11:55.440660: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2022-05-22 12:11:55.469860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2022-05-22 12:11:55.470061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2022-05-22 12:11:55.473859: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2022-05-22 12:11:55.477597: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2022-05-22 12:11:55.479654: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2022-05-22 12:11:55.483263: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2022-05-22 12:11:55.485340: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2022-05-22 12:11:55.491976: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2022-05-22 12:11:55.492126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2022-05-22 12:11:55.831924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-22 12:11:55.832025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2022-05-22 12:11:55.832240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2022-05-22 12:11:55.832449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6674 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
audio_format=6
num_channels=2
sample_rate=16000 (desired=16000)
bits_per_sample=8
res.buffer_size=291939
2022-05-22 12:11:56.096138: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
the result is: 鍘邋邋

@djmitche @Elleo @KathyReid @danielwinkler @hwine

I record .m4a file from windows 'Voice Recorder' in Chinese, and then convert it to .wav:
audio_format=6
num_channels=2
sample_rate=16000 (desired=16000)
bits_per_sample=8
res.buffer_size=291939
it seems that the codes run correctly:

fseek(wave, 40, SEEK_SET); rv = fread(&res.buffer_size, 4, 1, wave);
fprintf(stderr, "res.buffer_size=%ld\n", res.buffer_size);

fseek(wave, 44, SEEK_SET);
res.buffer = (char*)malloc(sizeof(char) * res.buffer_size);
rv = fread(res.buffer, sizeof(char), res.buffer_size, wave);

res.buffer_size is 291939, means that voice file has been read.

BUT when I convert the .m4a flie to .wav, using mono channel:
audio_format=1
num_channels=1
sample_rate=16000 (desired=16000)
bits_per_sample=16
res.buffer_size=62
2022-05-22 13:15:58.042539: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
the result is: ?

the res.buffer_size=62, it looks like the voice file was not read correctly.

why????

FYI -- #3693

DeepSpeech is unmaintained, please see #3693.