kaka-lin / ASR-notes

A practical collection of ASR models and tools — including Whisper variants and Google STT — with implementations for real-time, batch transcription, and multi-platform integration.

Repository from Github https://github.comkaka-lin/ASR-notesRepository from Github https://github.comkaka-lin/ASR-notes

ASR-notes

A collection of notes, tutorials, and implementations for Automatic Speech Recognition (ASR). Covers fundamentals, popular open-source models (like Whisper), and practical use cases such as real-time transcription and model fine-tuning.


Contents

1. ASR 基礎 (Fundamentals)

2. 模型與架構 (Models & Architectures)

3. 雲端 ASR 服務 (Cloud-based ASR APIs)

4. 相關工具與專案 (Related Tools & Repositories)

  • 🔊 Multi-ASR Toolkit: A command-line and Web UI interface for speech recognition apps using Whisper or SpeechRecognition.
  • 🧰 audio-tools: Utilities for working with audio: WAV reader/writer, recording, ALSA/tinyalsa wrappers.
  • 📊 audio-analysis-tools: Tools for spectral analysis, FFT visualization, and feature extraction.
  • 😊 speech-emotion-recognition: Deep learning models for detecting emotion from audio.

About

A practical collection of ASR models and tools — including Whisper variants and Google STT — with implementations for real-time, batch transcription, and multi-platform integration.

License:MIT License


Languages

Language:Python 100.0%