mingjie chen's starred repositories
LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
RTranslator
Open source real-time translation app for Android that runs locally
SenseVoice
Multilingual Voice Understanding Model
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
conversational-datasets
Large datasets for conversational AI
Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
STAR-Adapt
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
MooER
MooER: Moore-threads Open Omni model for spech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
GigaSpeech2
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
SummaryMixing
This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
Emotion-LLaMA
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
simul_whisper
Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
emotional-speech-annotations
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
speech-to-speech
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
ConversationalDataset
All benchmarks related to conversations