Jason's Lab's repositories
av_hubert
A self-supervised learning framework for audio-visual speech
avsr-conformer
AVSR with NIA
AVSR_papers
This repository mainly collects the papers for transformation between three modalities: audio, visual and text..
ColossalAI
Making big AI models cheaper, easier, and more scalable
Conference-Acceptance-Rate
Acceptance rates for the major AI conferences
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
e2e_lfmmi
E2E system with LF-MMI; word N-gram for Mandarin
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
hugo
The world’s fastest framework for building websites.
Leveraging-Self-Supervised-Learning-for-AVSR
Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition
lit-llama
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
LLaMA-Adapter
Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
Multimodal-GPT
Multimodal-GPT
open_flamingo
An open-source framework for training large multimodal models.
OpenFace
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
PaddleSpeech
An Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.
ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
rnn-transducer
A Pytorch Implementation of Transducer Model for End-to-End Speech Recognition
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
sherpa
Speech-to-text server framework with next-gen Kaldi
sherpa-onnx
Real-time speech recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
voxceleb_trainer
In defence of metric learning for speaker recognition
voxpopuli
A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
youtube-dl
Command-line program to download videos from YouTube.com and other video sites