What does "Error: failed to call OrtRun(). error code = 6." mean? I know it is ONNX related, but how to fix?

Question

What does "Error: failed to call OrtRun(). error code = 6." mean? I know it is ONNX related, but how to fix?

jquintanilla4 opened this issue a month ago · comments

Question

I keep running into the same issue when using transformers.js Automatic Speech Recognition pipeline. I've tried solving it multiple ways. But pretty much hit a wall every time. I've done lots of googling, LLMs, and used my prior knowledge of how this stuff functions in python. But I can't seem to get it to work.

I've tried setting up my environment with and without vite. I've tried with react javascript. I've tried with with react typescript. Nothing.

Am i missing a dependency or something? is there a place I can find what the error code means? because I couldn't find it anywhere.

I've fed it an array. I've fed it a .wav file. Nothing works. No matter what I do. No matter if it's an array or a wav file. I always get the same error:

An error occurred during model execution: "Error: failed to call OrtRun(). error code = 6.".
Inputs given to model: {input_features: Proxy(Tensor)}
Error transcribing audio: Error: failed to call OrtRun(). error code = 6.
    at e.run (wasm-core-impl.ts:392:1)
    at e.run (proxy-wrapper.ts:212:1)
    at e.OnnxruntimeWebAssemblySessionHandler.run (session-handler.ts:99:1)
    at InferenceSession.run (inference-session-impl.ts:108:1)
    at sessionRun (models.js:207:1)
    at encoderForward (models.js:520:1)
    at Function.seq2seqForward [as _forward] (models.js:361:1)
    at Function.forward (models.js:820:1)
    at Function.seq2seqRunBeam [as _runBeam] (models.js:480:1)
    at Function.runBeam (models.js:1373:1)

It seems to be a ONNX Runtime issue. But don't know how to fix it. Any guidance will be appreciated.

Note: I'm currently testing with English. Nothing fancy.

Joshua Lochner · Answer 1 · Sat May 04 2024 20:00:34 GMT+0800 (China Standard Time)

Hi there 👋 error code 6 is usually related to out-of-memory issues. Can you provide the code you are running (as well as the model being used)?

jquintanilla4 · Answer 2 · Mon May 06 2024 18:50:46 GMT+0800 (China Standard Time)

Model I was trying to use was whisper medium.
Here's the full code for the react component:

import React, { useRef, useState, useEffect } from 'react';
import { MediaRecorder, register } from 'extendable-media-recorder';
import { connect } from 'extendable-media-recorder-wav-encoder';
import { pipeline, env, read_audio } from '@xenova/transformers';

env.allowLocalModels=false;

interface AutomaticSpeechRecognitionOutput {
    text?: string;
}

const AudioInput: React.FC = () => {
    const [isRecording, setIsRecording] = useState<boolean>(false);
    const [audioBlob, setAudioBlob] = useState<Blob | null>(null);
    const [recordTime, setRecordTime] = useState<number>(0);
    const [transcription, setTranscription] = useState<string>('');
    const mediaRecorderRef = useRef<MediaRecorder | null>(null);
    const audioChunksRef = useRef<Blob[]>([]);
    const streamRef = useRef<MediaStream | null>(null);
    const recordIntervalRef = useRef<NodeJS.Timeout | null>(null);

    useEffect(() => {
        async function setupRecorder() {
            try {
                await register(await connect());
            } catch (error: any) {
                if (error.message.includes("already an encoder stored")) {
                    console.log("Encoder already registered, continuing...");
                } else {
                    console.error('Error registering encoder:', error);
                    return;
                }
            }

            try {
                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
                streamRef.current = stream;
            } catch (error: any) {
                console.error('Error accessing microphone:', error);
            }
        }

        setupRecorder();

        return () => {
            if (streamRef.current) {
                streamRef.current.getTracks().forEach(track => track.stop());
            }
            if (recordIntervalRef.current) {
                clearInterval(recordIntervalRef.current);
            }
        };
    }, []);

    const transcribeAudio = async (audioBlob: Blob) => {
        // const arrayBuffer = await audioBlob.arrayBuffer();
        // const audioData = new Uint8Array(arrayBuffer);
        // const audioData = new Float32Array(arrayBuffer);
        const audioURL = URL.createObjectURL(audioBlob);
        const audioData = await read_audio(audioURL, 16000);

        try {
            const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-medium');
            console.log('Transcriber initialized.'); // Confirm that the transcriber has been initialized
            const output = await transcriber(audioData, { language: 'english', task: 'transcribe', chunk_length_s: 30, stride_length_s: 5 });
            console.log('Transcription output:', output); // Log the full output object
            if (output && !Array.isArray(output) && output.text) {
                setTranscription(output.text);
            } else {
                setTranscription('No transcription output');
            }

            URL.revokeObjectURL(audioURL); // clean up the object URL after use
        } catch (error: any) {
            console.error('Error transcribing audio:', error);
        }
    };
    
    const startRecording = () => {
        if (streamRef.current) {
            mediaRecorderRef.current = new MediaRecorder(streamRef.current, { mimeType: 'audio/wav' }) as any; // any; because it's from ext library
    
            if (mediaRecorderRef.current) { // Check if mediaRecorderRef.current is not null before adding event listeners
                mediaRecorderRef.current.addEventListener('dataavailable', (event: BlobEvent) => {
                    audioChunksRef.current.push(event.data);
                });
    
                mediaRecorderRef.current.addEventListener('stop', async () => {
                    if (mediaRecorderRef.current) { // Additional check before accessing mimeType
                        const mimeType = mediaRecorderRef.current.mimeType;
                        const audioBlob = new Blob(audioChunksRef.current, { type: mimeType });
                        setAudioBlob(audioBlob);
                        audioChunksRef.current = [];
                        await transcribeAudio(audioBlob);
                    }
                });
    
                audioChunksRef.current = [];
                mediaRecorderRef.current.start();
                setIsRecording(true);
                setRecordTime(0);
                recordIntervalRef.current = setInterval(() => {
                    setRecordTime(prevTime => prevTime + 1);
                }, 1000);
            } else {
                console.error('Failed to initialize MediaRecorder');
            }
        } else {
            console.error('Stream not initialized');
        }
    };

    const stopRecording = () => {
        if (mediaRecorderRef.current && mediaRecorderRef.current.state === 'recording') {
            mediaRecorderRef.current.stop();
            setIsRecording(false);
            if (recordIntervalRef.current) {
                clearInterval(recordIntervalRef.current);
            }
        } else {
            console.error('MediaRecorder not recording or not initialized');
        }
    };

    const playAudio = () => {
        if (audioBlob) {
            const audioURL = URL.createObjectURL(audioBlob);
            const audio = new Audio(audioURL);
            audio.play().catch((error: any) => {
                console.error('Error playing the audio:', error);
                URL.revokeObjectURL(audioURL);
            });
        }
    };

    return (
        <div className="audio-input-container">
            <button onClick={startRecording}>Start Recording</button>
            <button onClick={stopRecording}>Stop Recording</button>
            <button onClick={playAudio}>Play Audio</button>
            <p>Recording: {isRecording ? `${recordTime} seconds` : 'No'}</p>
            <p>Transcription: {transcription}</p>
        </div>
    );
};

export default AudioInput;

Joshua Lochner · Answer 3 · Mon May 06 2024 20:35:46 GMT+0800 (China Standard Time)

Note that every single call to

const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-medium');

allocates new memory for a pipeline (and takes a lot of time to construct the model). This is most likely the reason for your out-of-memory issues, since you call this every time you transcribe audio.

I would also recommend selecting a smaller model, like https://huggingface.co/Xenova/whisper-base, https://huggingface.co/Xenova/whisper-small, https://huggingface.co/Xenova/whisper-tiny, https://huggingface.co/distil-whisper/distil-medium.en, or https://huggingface.co/distil-whisper/distil-small.en.

Hope that helps!

jquintanilla4 · Answer 4 · Tue May 07 2024 15:14:34 GMT+0800 (China Standard Time)

That's good to know. I'll give those other models a shot. However this happens on the first call.

The dev machine i'm trying to run it in has a RTX 4090. I'm surprised that's the issue, since i've never run into memory problems when running whisper in python. Does WebGPU have a memory ceiling?

Thanks for your help.

Joshua Lochner · Answer 5 · Tue May 07 2024 16:30:16 GMT+0800 (China Standard Time)

The dev machine i'm trying to run it in has a RTX 4090. I'm surprised that's the issue, since i've never run into memory problems when running whisper in python. Does WebGPU have a memory ceiling?

Assuming you are running Transformers.js v2, everything still runs with WASM/CPU. You can follow along with the development of v3 here, which will add WebGPU support.

jquintanilla4 · Answer 6 · Sat May 11 2024 17:18:17 GMT+0800 (China Standard Time)

Gotcha. Now it all makes sense. I'll be keeping my eye on v3. Thanks for the patience and good luck with all the work ahead.