Rao Ma's starred repositories

CIF-HieraDist

[INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation

Language:PythonLicense:Apache-2.0Stargazers:36Issues:0Issues:0

CIF-PyTorch

[ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-Fire mechanism).

Language:PythonLicense:Apache-2.0Stargazers:65Issues:0Issues:0

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Language:PythonLicense:Apache-2.0Stargazers:40270Issues:0Issues:0

Awesome-Chinese-LLM

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

Stargazers:14303Issues:0Issues:0

prepend_acoustic_attack

Prepend universal audio attack segment to mute Whisper

Language:PythonStargazers:8Issues:0Issues:0

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:7353Issues:0Issues:0
Language:PythonStargazers:2Issues:0Issues:0

lm-contamination

The LM Contamination Index is a manually created database of contamination evidences for LMs.

Language:PythonStargazers:73Issues:0Issues:0

cookbook

Examples and guides for using the Gemini API.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4570Issues:0Issues:0

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Language:PythonLicense:Apache-2.0Stargazers:164Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11277Issues:0Issues:0

llm_interview_note

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

Language:HTMLStargazers:2090Issues:0Issues:0

Machine-Learning-Interviews

This repo is meant to serve as a guide for Machine Learning/AI technical interviews.

Language:Jupyter NotebookLicense:MITStargazers:4130Issues:0Issues:0

TED-Multilingual-Parallel-Corpus

TED parallel Corpora is growing collection of Bilingual parallel corpora, Multilingual parallel corpora and Monolingual corpora extracted from TED talks www.ted.com for 109 world languages.

Stargazers:239Issues:0Issues:0
Language:PythonStargazers:2Issues:0Issues:0

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

Language:PythonStargazers:350Issues:0Issues:0
Stargazers:1086Issues:0Issues:0

TikTok-Api

The Unofficial TikTok API Wrapper In Python

Language:PythonLicense:MITStargazers:4649Issues:0Issues:0

unified_multilingual_dataset_of_emotional_human_utterances

A unified dataset of multilingual emotional human utterances

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:22Issues:0Issues:0

mass-dataset

MaSS - Multilingual corpus of Sentence-aligned Spoken utterances

License:MITStargazers:48Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10674Issues:0Issues:0

COMET

A Neural Framework for MT Evaluation

Language:PythonLicense:Apache-2.0Stargazers:467Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7145Issues:0Issues:0

covost

CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)

Language:PythonLicense:NOASSERTIONStargazers:335Issues:0Issues:0

minChatGPT

A minimum example of aligning language models with RLHF similar to ChatGPT

Language:PythonLicense:GPL-3.0Stargazers:208Issues:0Issues:0

long-context-asr

Code for the paper: How Much Context Does My Attention-Based ASR System Need?

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9Issues:0Issues:0

cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Language:PythonLicense:BSD-2-ClauseStargazers:219Issues:0Issues:0
Language:PythonLicense:MITStargazers:1304Issues:0Issues:0

comparative-assessment

Framework for using LLMs to grade texts by using pairwise comparisons.

Language:PythonStargazers:6Issues:0Issues:0

faster-whisper

Faster Whisper transcription with CTranslate2

Language:PythonLicense:MITStargazers:10963Issues:0Issues:0