[csharp] SpeakerEmbeddingExtractor cost memory 6G and keep growing, does it normal or has memory leak?

Question

[csharp] SpeakerEmbeddingExtractor cost memory 6G and keep growing, does it normal or has memory leak?

bnuzhouwei opened this issue 5 months ago · comments

I just dynamic identify speakers in a wav file(about 40min) for 10 times, without the following code, i program cost only about 1.4G memory, but with the code, it cost up to 6G easily and It just keeps growing....

    public float[] ComputeEmbedding(float[] samples)
    {
      using OnlineStream stream = SpeakerEmbeddingExtractor.CreateStream();
      stream.AcceptWaveform(DetectorConfig.SampleRate, samples);
      stream.InputFinished();
      float[] embedding = SpeakerEmbeddingExtractor.Compute(stream);
      return embedding;
    }

var speakid = "";
var embedding = ComputeEmbedding(segment.Samples);
speakid = SpeakerEmbeddingManager.Search(embedding, 0.5f);
if (string.IsNullOrEmpty(speakid))
{
  speakid = speakeridIndex.ToString();
  SpeakerEmbeddingManager.Add(speakid, embedding);
  speakeridIndex++;
}

Is it memory leak problem or it just cost much memory.

Fangjun Kuang · Answer 1 · Tue Jun 04 2024 22:06:02 GMT+0800 (China Standard Time)

Does speaker id index also increase?

ZhouWei · Answer 2 · Tue Jun 04 2024 22:23:35 GMT+0800 (China Standard Time)

Does speaker id index also increase?

Increase up to 24.

Fangjun Kuang · Answer 3 · Tue Jun 04 2024 22:26:17 GMT+0800 (China Standard Time)

Increase up to 24.

Does it mean the max value for speakeridIndex is 24 and the RAM still increases after speakeridIndex reaches 24?

Is it possible to share the complete code?

ZhouWei · Answer 4 · Tue Jun 04 2024 22:46:25 GMT+0800 (China Standard Time)

Yes, the max speakerIndex reaches 24, but the momory still inscreases as time. The full code is as following i just combine the vad, parafomer, prun, speakerid as a pipeline.

The vad split too long wavs, and the computemabing cost 500M for 10s wav, so need to limit the length for computing!

Fangjun Kuang · Answer 5 · Wed Jun 05 2024 12:32:18 GMT+0800 (China Standard Time)

There are vad, offline asr, punct, and speaker embedding extractor models.

Could you remove one of them, run again, and see which model causes the RAM to increase?

ZhouWei · Answer 6 · Wed Jun 05 2024 14:09:06 GMT+0800 (China Standard Time)

There are vad, offline asr, punct, and speaker embedding extractor models.

Could you remove one of them, run again, and see which model causes the RAM to increase?

I have test that the speaker embedding extractor models causes iincrease by time.

The vad, offline asr will be stable at 1.4G.

But the speaker embedding extractor increase to 8G and more by time.

Fangjun Kuang · Answer 7 · Wed Jun 05 2024 15:16:04 GMT+0800 (China Standard Time)

what's about the punctuation?

ZhouWei · Answer 8 · Wed Jun 05 2024 15:37:27 GMT+0800 (China Standard Time)

what's about the punctuation?
The punctuation has bugs that can't no run too much time, so I didn't press test it yet