Mickey's starred repositories
LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
UnsupervisedMT
Phrase-Based & Neural Unsupervised Machine Translation
simul_whisper
Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
SenseVoice
Multilingual Voice Understanding Model
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
ms-swift
Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector