k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust

Home Page:https://k2-fsa.github.io/sherpa/onnx/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[csharp] SpeakerEmbeddingExtractor cost memory 6G and keep growing, does it normal or has memory leak?

bnuzhouwei opened this issue · comments

I just dynamic identify speakers in a wav file(about 40min) for 10 times, without the following code, i program cost only about 1.4G memory, but with the code, it cost up to 6G easily and It just keeps growing....

    public float[] ComputeEmbedding(float[] samples)
    {
      using OnlineStream stream = SpeakerEmbeddingExtractor.CreateStream();
      stream.AcceptWaveform(DetectorConfig.SampleRate, samples);
      stream.InputFinished();
      float[] embedding = SpeakerEmbeddingExtractor.Compute(stream);
      return embedding;
    }

var speakid = "";
var embedding = ComputeEmbedding(segment.Samples);
speakid = SpeakerEmbeddingManager.Search(embedding, 0.5f);
if (string.IsNullOrEmpty(speakid))
{
  speakid = speakeridIndex.ToString();
  SpeakerEmbeddingManager.Add(speakid, embedding);
  speakeridIndex++;
}

Is it memory leak problem or it just cost much memory.

Does speaker id index also increase?

Does speaker id index also increase?

Increase up to 24.

Increase up to 24.

Does it mean the max value for speakeridIndex is 24 and the RAM still increases after speakeridIndex reaches 24?

Is it possible to share the complete code?

Yes, the max speakerIndex reaches 24, but the momory still inscreases as time. The full code is as following i just combine the vad, parafomer, prun, speakerid as a pipeline.

The vad split too long wavs, and the computemabing cost 500M for 10s wav, so need to limit the length for computing!

There are vad, offline asr, punct, and speaker embedding extractor models.

Could you remove one of them, run again, and see which model causes the RAM to increase?

There are vad, offline asr, punct, and speaker embedding extractor models.

Could you remove one of them, run again, and see which model causes the RAM to increase?

I have test that the speaker embedding extractor models causes iincrease by time.

The vad, offline asr will be stable at 1.4G.

But the speaker embedding extractor increase to 8G and more by time.

what's about the punctuation?

what's about the punctuation?
The punctuation has bugs that can't no run too much time, so I didn't press test it yet