k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust

Home Page:https://k2-fsa.github.io/sherpa/onnx/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invoked with: <_sherpa_onnx.SpeakerEmbeddingExtractor object at 0x7f6b40d15eb0>, <_sherpa_onnx.OfflineStream object at 0x7f6b42ebf8b0>

MonolithFoundation opened this issue · comments

How to reuse the stream from recognizer?

I have first a vad detect voices, and using recognizer to create stream and then decode, then speaker model get name:

streams = []
        segments: List[Segment] = []
        while not vad.empty():
            segment = Segment(
                start=vad.front.start / args.sample_rate,
                duration=len(vad.front.samples) / args.sample_rate,
            )
            segments.append(segment)

            stream = recognizer.create_stream()
            stream.accept_waveform(args.sample_rate, vad.front.samples)

            streams.append(stream)

            vad.pop()

        print(streams)
        for s in streams:
            recognizer.decode_stream(s)

        for seg, stream in zip(segments, streams):
            seg.text = stream.result.text
            segment_list.append(seg)

        # add for speaker identification
        embedding = extractor.compute(stream)
        embedding = np.array(embedding)
        name = manager.search(embedding, threshold=args.threshold)

Got error:

Invoked with: <_sherpa_onnx.SpeakerEmbeddingExtractor object at 0x7f6b40d15eb0>, <_sherpa_onnx.OfflineStream object at 0x7f6b42ebf8b0>

Any help?

Please read our Python examples about how to use the speaker extractor.

..... good for you.

The above code clearly shows how to create a stream.

You are using a recognizer.create_stream() to get a stream and pass it to the extractor. That is why you get the error.


You get this error since you have not followed the Python examples.

Thank you for your assistance. Can the stream be reused? Does it need to be created for each component?

Apologies for the simplistic questions. I am not as familiar with your code as you are.

Please find example code in
https://github.com/k2-fsa/sherpa-onnx/tree/master/python-api-examples

You can find the answer by reading the example code.

(Search for files containing speaker in the filename)

thank u, I have made it work.

Still wanna ask, it create stream for each component, the stream contains bytes audio, why it can not from_stream or asign previous bytes to next component

Sorry, I cannot understand what you mean. You can ask in Chinese if you want or please describe your issue in detail.