xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

Home Page:https://huggingface.co/docs/transformers.js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What does "Error: failed to call OrtRun(). error code = 6." mean? I know it is ONNX related, but how to fix?

jquintanilla4 opened this issue · comments

Question

I keep running into the same issue when using transformers.js Automatic Speech Recognition pipeline. I've tried solving it multiple ways. But pretty much hit a wall every time. I've done lots of googling, LLMs, and used my prior knowledge of how this stuff functions in python. But I can't seem to get it to work.

I've tried setting up my environment with and without vite. I've tried with react javascript. I've tried with with react typescript. Nothing.

Am i missing a dependency or something? is there a place I can find what the error code means? because I couldn't find it anywhere.

I've fed it an array. I've fed it a .wav file. Nothing works. No matter what I do. No matter if it's an array or a wav file. I always get the same error:

An error occurred during model execution: "Error: failed to call OrtRun(). error code = 6.".
Inputs given to model: {input_features: Proxy(Tensor)}
Error transcribing audio: Error: failed to call OrtRun(). error code = 6.
    at e.run (wasm-core-impl.ts:392:1)
    at e.run (proxy-wrapper.ts:212:1)
    at e.OnnxruntimeWebAssemblySessionHandler.run (session-handler.ts:99:1)
    at InferenceSession.run (inference-session-impl.ts:108:1)
    at sessionRun (models.js:207:1)
    at encoderForward (models.js:520:1)
    at Function.seq2seqForward [as _forward] (models.js:361:1)
    at Function.forward (models.js:820:1)
    at Function.seq2seqRunBeam [as _runBeam] (models.js:480:1)
    at Function.runBeam (models.js:1373:1)

It seems to be a ONNX Runtime issue. But don't know how to fix it. Any guidance will be appreciated.

Note: I'm currently testing with English. Nothing fancy.

Hi there 👋 error code 6 is usually related to out-of-memory issues. Can you provide the code you are running (as well as the model being used)?

Model I was trying to use was whisper medium.
Here's the full code for the react component:

import React, { useRef, useState, useEffect } from 'react';
import { MediaRecorder, register } from 'extendable-media-recorder';
import { connect } from 'extendable-media-recorder-wav-encoder';
import { pipeline, env, read_audio } from '@xenova/transformers';

env.allowLocalModels=false;

interface AutomaticSpeechRecognitionOutput {
    text?: string;
}

const AudioInput: React.FC = () => {
    const [isRecording, setIsRecording] = useState<boolean>(false);
    const [audioBlob, setAudioBlob] = useState<Blob | null>(null);
    const [recordTime, setRecordTime] = useState<number>(0);
    const [transcription, setTranscription] = useState<string>('');
    const mediaRecorderRef = useRef<MediaRecorder | null>(null);
    const audioChunksRef = useRef<Blob[]>([]);
    const streamRef = useRef<MediaStream | null>(null);
    const recordIntervalRef = useRef<NodeJS.Timeout | null>(null);

    useEffect(() => {
        async function setupRecorder() {
            try {
                await register(await connect());
            } catch (error: any) {
                if (error.message.includes("already an encoder stored")) {
                    console.log("Encoder already registered, continuing...");
                } else {
                    console.error('Error registering encoder:', error);
                    return;
                }
            }

            try {
                const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
                streamRef.current = stream;
            } catch (error: any) {
                console.error('Error accessing microphone:', error);
            }
        }

        setupRecorder();

        return () => {
            if (streamRef.current) {
                streamRef.current.getTracks().forEach(track => track.stop());
            }
            if (recordIntervalRef.current) {
                clearInterval(recordIntervalRef.current);
            }
        };
    }, []);

    const transcribeAudio = async (audioBlob: Blob) => {
        // const arrayBuffer = await audioBlob.arrayBuffer();
        // const audioData = new Uint8Array(arrayBuffer);
        // const audioData = new Float32Array(arrayBuffer);
        const audioURL = URL.createObjectURL(audioBlob);
        const audioData = await read_audio(audioURL, 16000);

        try {
            const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-medium');
            console.log('Transcriber initialized.'); // Confirm that the transcriber has been initialized
            const output = await transcriber(audioData, { language: 'english', task: 'transcribe', chunk_length_s: 30, stride_length_s: 5 });
            console.log('Transcription output:', output); // Log the full output object
            if (output && !Array.isArray(output) && output.text) {
                setTranscription(output.text);
            } else {
                setTranscription('No transcription output');
            }

            URL.revokeObjectURL(audioURL); // clean up the object URL after use
        } catch (error: any) {
            console.error('Error transcribing audio:', error);
        }
    };
    
    const startRecording = () => {
        if (streamRef.current) {
            mediaRecorderRef.current = new MediaRecorder(streamRef.current, { mimeType: 'audio/wav' }) as any; // any; because it's from ext library
    
            if (mediaRecorderRef.current) { // Check if mediaRecorderRef.current is not null before adding event listeners
                mediaRecorderRef.current.addEventListener('dataavailable', (event: BlobEvent) => {
                    audioChunksRef.current.push(event.data);
                });
    
                mediaRecorderRef.current.addEventListener('stop', async () => {
                    if (mediaRecorderRef.current) { // Additional check before accessing mimeType
                        const mimeType = mediaRecorderRef.current.mimeType;
                        const audioBlob = new Blob(audioChunksRef.current, { type: mimeType });
                        setAudioBlob(audioBlob);
                        audioChunksRef.current = [];
                        await transcribeAudio(audioBlob);
                    }
                });
    
                audioChunksRef.current = [];
                mediaRecorderRef.current.start();
                setIsRecording(true);
                setRecordTime(0);
                recordIntervalRef.current = setInterval(() => {
                    setRecordTime(prevTime => prevTime + 1);
                }, 1000);
            } else {
                console.error('Failed to initialize MediaRecorder');
            }
        } else {
            console.error('Stream not initialized');
        }
    };

    const stopRecording = () => {
        if (mediaRecorderRef.current && mediaRecorderRef.current.state === 'recording') {
            mediaRecorderRef.current.stop();
            setIsRecording(false);
            if (recordIntervalRef.current) {
                clearInterval(recordIntervalRef.current);
            }
        } else {
            console.error('MediaRecorder not recording or not initialized');
        }
    };

    const playAudio = () => {
        if (audioBlob) {
            const audioURL = URL.createObjectURL(audioBlob);
            const audio = new Audio(audioURL);
            audio.play().catch((error: any) => {
                console.error('Error playing the audio:', error);
                URL.revokeObjectURL(audioURL);
            });
        }
    };

    return (
        <div className="audio-input-container">
            <button onClick={startRecording}>Start Recording</button>
            <button onClick={stopRecording}>Stop Recording</button>
            <button onClick={playAudio}>Play Audio</button>
            <p>Recording: {isRecording ? `${recordTime} seconds` : 'No'}</p>
            <p>Transcription: {transcription}</p>
        </div>
    );
};

export default AudioInput;

Note that every single call to

const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-medium');

allocates new memory for a pipeline (and takes a lot of time to construct the model). This is most likely the reason for your out-of-memory issues, since you call this every time you transcribe audio.

I would also recommend selecting a smaller model, like https://huggingface.co/Xenova/whisper-base, https://huggingface.co/Xenova/whisper-small, https://huggingface.co/Xenova/whisper-tiny, https://huggingface.co/distil-whisper/distil-medium.en, or https://huggingface.co/distil-whisper/distil-small.en.

Hope that helps!

That's good to know. I'll give those other models a shot. However this happens on the first call.

The dev machine i'm trying to run it in has a RTX 4090. I'm surprised that's the issue, since i've never run into memory problems when running whisper in python. Does WebGPU have a memory ceiling?

Thanks for your help.

The dev machine i'm trying to run it in has a RTX 4090. I'm surprised that's the issue, since i've never run into memory problems when running whisper in python. Does WebGPU have a memory ceiling?

Assuming you are running Transformers.js v2, everything still runs with WASM/CPU. You can follow along with the development of v3 here, which will add WebGPU support.

Gotcha. Now it all makes sense. I'll be keeping my eye on v3. Thanks for the patience and good luck with all the work ahead.