Mickey's starred repositories

FDRL-MHAD

Dataset

Stargazers:1Issues:0Issues:0

cc_net

Tools to download and cleanup Common Crawl data

Language:PythonLicense:MITStargazers:965Issues:0Issues:0

arthas

Alibaba Java Diagnostic Tool Arthas/Alibaba Java诊断利器Arthas

Language:JavaLicense:Apache-2.0Stargazers:35537Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:6399Issues:0Issues:0

LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Language:PythonLicense:Apache-2.0Stargazers:2421Issues:0Issues:0

UnsupervisedMT

Phrase-Based & Neural Unsupervised Machine Translation

Language:PythonLicense:NOASSERTIONStargazers:1506Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

ConST

code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)

Language:PythonLicense:MITStargazers:62Issues:0Issues:0

MooER

MooER: Open-sourced LLM for audio understanding trained on 80,000 hours of data

Language:PythonLicense:NOASSERTIONStargazers:135Issues:0Issues:0

simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

Language:PythonStargazers:42Issues:0Issues:0

whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Language:PythonLicense:MITStargazers:1947Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:5653Issues:0Issues:0

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:NOASSERTIONStargazers:3077Issues:0Issues:0
Language:PythonStargazers:490Issues:0Issues:0

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonLicense:MITStargazers:539Issues:0Issues:0

SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

Stargazers:74Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10861Issues:0Issues:0

asrp

ASR text preprocessing utility

Language:PythonLicense:Apache-2.0Stargazers:20Issues:0Issues:0

LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonLicense:Apache-2.0Stargazers:32675Issues:0Issues:0

axolotl

Go ahead and axolotl questions

Language:PythonLicense:Apache-2.0Stargazers:7753Issues:0Issues:0

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Stargazers:7467Issues:0Issues:0

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1446Issues:0Issues:0

Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Language:ShellStargazers:8948Issues:0Issues:0

ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:3896Issues:0Issues:0

TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

Language:PythonLicense:MITStargazers:360Issues:0Issues:0

silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:4924Issues:0Issues:0

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonLicense:MITStargazers:4194Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19906Issues:0Issues:0

bolt

Bolt is a deep learning library with high performance and heterogeneous flexibility.

Language:C++License:MITStargazers:913Issues:0Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:35744Issues:0Issues:0