hnluo

hnluo

Geek Repo

Company:Alibaba Group

Location:hangzhou

Github PK Tool:Github PK Tool

hnluo's starred repositories

HowToCook

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Language:DockerfileLicense:UnlicenseStargazers:66333Issues:401Issues:662

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:30130Issues:428Issues:4182

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20575Issues:203Issues:372

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:15731Issues:104Issues:1015

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:13235Issues:99Issues:1040

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonLicense:Apache-2.0Stargazers:12335Issues:131Issues:204

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Language:PythonLicense:BSD-3-ClauseStargazers:8000Issues:139Issues:3697

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonLicense:NOASSERTIONStargazers:5825Issues:57Issues:1064

Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Language:PythonLicense:Apache-2.0Stargazers:5666Issues:66Issues:129

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Language:PythonLicense:MITStargazers:4555Issues:51Issues:208

FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Language:PythonLicense:MITStargazers:3225Issues:31Issues:86

fairscale

PyTorch extensions for high performance and large scale training.

Language:PythonLicense:NOASSERTIONStargazers:3137Issues:45Issues:359

modelscope-agent

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

Language:PythonLicense:Apache-2.0Stargazers:2587Issues:37Issues:200

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1385Issues:25Issues:65

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonLicense:Apache-2.0Stargazers:1064Issues:17Issues:90

TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language:PythonLicense:Apache-2.0Stargazers:929Issues:32Issues:207

Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:636Issues:9Issues:125

GigaSpeech

Large, modern dataset for speech recognition

Language:ShellLicense:Apache-2.0Stargazers:625Issues:18Issues:61

sherpa

Speech-to-text server framework with next-gen Kaldi

Language:C++License:Apache-2.0Stargazers:518Issues:33Issues:192

KAN-TTS

KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech

Language:PythonLicense:MITStargazers:482Issues:14Issues:68

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonLicense:MITStargazers:342Issues:16Issues:50

speech-recognition-papers

Towards hot directions in industrial end to end speech recognition

neurst

Neural end-to-end Speech Translation Toolkit

Language:PythonLicense:NOASSERTIONStargazers:298Issues:15Issues:23

opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

aps

A personal toolkit for single/multi-channel speech recognition & enhancement & separation.

Language:PythonLicense:Apache-2.0Stargazers:138Issues:9Issues:2

GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Language:PythonLicense:Apache-2.0Stargazers:93Issues:5Issues:7

torch-mfcc

A librosa STFT/Fbank/mfcc feature extration written up in PyTorch using 1D Convolutions.

Language:PythonLicense:MITStargazers:72Issues:2Issues:2

Conformer-Athena

Dynamic Chunk Streaming and Offline Conformer based on athena-team/Athena.

Language:PythonLicense:Apache-2.0Stargazers:43Issues:1Issues:1