hnluo's starred repositories
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
modelscope-agent
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
TensorFlowASR
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
GigaSpeech
Large, modern dataset for speech recognition
speech-recognition-papers
Towards hot directions in industrial end to end speech recognition
GigaSpeech2
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
torch-mfcc
A librosa STFT/Fbank/mfcc feature extration written up in PyTorch using 1D Convolutions.
Conformer-Athena
Dynamic Chunk Streaming and Offline Conformer based on athena-team/Athena.