mingjie chen's starred repositories

LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:33872Issues:209Issues:5185

RTranslator

Open source real-time translation app for Android that runs locally

Language:C++License:Apache-2.0Stargazers:6770Issues:50Issues:64

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:6119Issues:58Issues:485

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:NOASSERTIONStargazers:3339Issues:38Issues:132

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Language:PythonLicense:MITStargazers:3063Issues:97Issues:110

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2717Issues:37Issues:137

LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Language:PythonLicense:Apache-2.0Stargazers:2527Issues:28Issues:46

MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:2523Issues:34Issues:47

whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Language:PythonLicense:MITStargazers:2037Issues:37Issues:106

conversational-datasets

Large datasets for conversational AI

Language:PythonLicense:Apache-2.0Stargazers:1294Issues:74Issues:30

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Language:PythonLicense:MITStargazers:981Issues:18Issues:68

STAR-Adapt

Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"

bc-omni

Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊

MooER

MooER: Moore-threads Open Omni model for spech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.

Language:PythonLicense:NOASSERTIONStargazers:147Issues:5Issues:13

EmoBox

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Parameter-Efficient-MoE

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Language:PythonLicense:Apache-2.0Stargazers:129Issues:4Issues:8

AudioLLM

Audio Large Language Models

RSTnet

Real-time Speech-Text Foundation Model Toolkit (wip)

GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Language:PythonLicense:Apache-2.0Stargazers:114Issues:6Issues:8

SummaryMixing

This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).

Language:PythonLicense:NOASSERTIONStargazers:111Issues:10Issues:3

Emotion-LLaMA

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Language:PythonLicense:BSD-3-ClauseStargazers:102Issues:5Issues:19
Language:PythonLicense:MITStargazers:49Issues:2Issues:2

simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

Dasheng

Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"

Language:PythonLicense:Apache-2.0Stargazers:43Issues:3Issues:3

emotional-speech-annotations

This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models

License:Apache-2.0Stargazers:28Issues:4Issues:0

speech-to-speech

Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"

Language:PythonStargazers:28Issues:2Issues:0

ConversationalDataset

All benchmarks related to conversations

Language:Jupyter NotebookStargazers:4Issues:0Issues:0