sal1023

followers

0

following

stars

Spencer Lord's starred repositories

LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Language:PythonApache-2.0205600

moshi

Language:PythonApache-2.0577300

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT1319800

espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Language:CGPL-3.0408500

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION4680300

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2359500

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookMIT598600

distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language:PythonMIT352300

tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

Language:PythonMIT42900

low_cost_robot

Language:PythonMIT299300

scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Language:PythonApache-2.01161500

ml-engineering

Machine Learning Engineering Open Book

Language:PythonCC-BY-SA-4.01107400

whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Language:PythonMIT183600

transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

Language:JavaScriptApache-2.01106700

tabby

Self-hosted AI coding assistant

Language:RustNOASSERTION2127700

watlings

Learn WebAssembly by writing small programs!

Language:JavaScriptUnlicense162700

asr-sd-pipeline

Speech recognition & diarisation solution with text alignment, deployed in AML pipelines

Language:PythonMIT8100

py-webrtcvad

Python interface to the WebRTC Voice Activity Detector

Language:CNOASSERTION202200

whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Language:Jupyter NotebookBSD-2-Clause337600

stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Language:PythonMIT149800

whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language:CMIT3468800

faster-whisper

Faster Whisper transcription with CTranslate2

Language:PythonMIT1158700

whispering

Streaming transcriber with whisper

Language:PythonMIT68300

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language:PythonBSD-2-Clause1160400

porcupine

On-device wake word detection powered by deep learning

Language:PythonApache-2.0370200

usb_4_mic_array

ReSpeaker 4 Mic Array with builtin VAD, DOA, AEC, Beamforming & NS

Language:PythonApache-2.014100

mic_array

DOA, VAD and KWS for ReSpeaker Microphone Array

Language:PythonApache-2.028700

avs

python implementation of alexa voice service app, 支持DuerOS

Language:PythonNOASSERTION19500

ec

Echo Canceller, part of Voice Engine project

Language:CGPL-3.024400

seeed-voicecard

2 Mic Hat, 4 Mic Array, 6-Mic Circular Array Kit, and 4-Mic Linear Array Kit for Raspberry Pi

Language:CGPL-3.048000