Beast code in Giters

hnluo's starred repositories

HowToCook

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Language:DockerfileUnlicense66333 401 662

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT30130 428 4182

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT20575 203 372

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.015731 104 1015

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.013235 99 1040

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.012335 131 204

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonBSD-3-Clause8000 139 3697

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

7392 323 263

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION5825 57 1064

Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Language:PythonApache-2.05666 66 129

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Language:PythonMIT4555 51 208

FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Language:PythonMIT3225 31 86

fairscale

PyTorch extensions for high performance and large scale training.

Language:PythonNOASSERTION3137 45 359

modelscope-agent

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

Language:PythonApache-2.02587 37 200

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION1385 25 65

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonApache-2.01064 17 90

TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language:PythonApache-2.0929 32 207

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonApache-2.0636 9 125

GigaSpeech

Large, modern dataset for speech recognition

Language:ShellApache-2.0625 18 61

sherpa

Speech-to-text server framework with next-gen Kaldi

Language:C++Apache-2.0518 33 192

KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Language:PythonMIT482 14 68

TeleSpeech-ASR

Language:Python449 13 44

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonMIT342 16 50

speech-recognition-papers

Towards hot directions in industrial end to end speech recognition

MIT325 19 2

neurst

Neural end-to-end Speech Translation Toolkit

Language:PythonNOASSERTION298 15 23

opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

208 7 3

aps

A personal toolkit for single/multi-channel speech recognition & enhancement & separation.

Language:PythonApache-2.0138 9 2

GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Language:PythonApache-2.093 5 7

torch-mfcc

A librosa STFT/Fbank/mfcc feature extration written up in PyTorch using 1D Convolutions.

Language:PythonMIT72 2 2

Conformer-Athena

Dynamic Chunk Streaming and Offline Conformer based on athena-team/Athena.

Language:PythonApache-2.043 1 1