Spencer Lord's starred repositories
LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
tinydiarize
Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens
ml-engineering
Machine Learning Engineering Open Book
whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
transformers.js
State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
asr-sd-pipeline
Speech recognition & diarisation solution with text alignment, deployed in AML pipelines
py-webrtcvad
Python interface to the WebRTC Voice Activity Detector
whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
whisper.cpp
Port of OpenAI's Whisper model in C/C++
faster-whisper
Faster Whisper transcription with CTranslate2
whispering
Streaming transcriber with whisper
usb_4_mic_array
ReSpeaker 4 Mic Array with builtin VAD, DOA, AEC, Beamforming & NS
seeed-voicecard
2 Mic Hat, 4 Mic Array, 6-Mic Circular Array Kit, and 4-Mic Linear Array Kit for Raspberry Pi