RickyL-2000

followers

following

stars

ZJU

Ruiqi Li's starred repositories

CogVideo

Text-to-video generation. The repo for ICLR2023 paper "CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers"

Language:PythonApache-2.0355200

pytorchvideo

A deep learning library for video understanding research.

Language:PythonApache-2.0323400

acad-homepage.github.io

AcadHomepage: A Modern and Responsive Academic Personal Homepage

Language:SCSSMIT113100

av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Language:PythonNOASSERTION4300

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonApache-2.0229900

RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Language:Python70400

speechmetrics

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Language:PythonMIT87000

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.02795700

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2338100

ROSVOT

Robust Singing Voice Transcription and MIDI Extraction

Language:Python2800

Prompt-Singer

Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).

Language:PythonMIT4900

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Language:Jupyter NotebookApache-2.01249700

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonApache-2.0288800

VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Language:PythonMIT45600

VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Language:PythonNOASSERTION127100

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.0114000

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT190800

Omost

Your image is almost there!

Language:PythonApache-2.0692200

autochord

Automatic Chord Recognition tools - ISMIR2021 Late-Breaking Demo presentation

Language:Jupyter NotebookApache-2.09900

MERT

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".

Language:PythonApache-2.027200

OpenVoice

Instant voice cloning by MyShell.

Language:PythonMIT2736300

WeChatMsg

提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手

Language:PythonGPL-3.03157300

BeatNet

BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).

Language:PythonCC-BY-4.030600

Lyrics-Conditioned-Neural-Melody-Generation

Language:Jupyter Notebook41700

muzic

Muzic: Music Understanding and Generation with Artificial Intelligence

Language:PythonMIT437300

musegan

An AI for Music Generation

Language:PythonMIT177300

audioldm_eval

This toolbox aims to unify audio generation model evaluation for easier comparison.

Language:PythonMIT27700

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonNOASSERTION234400

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT2024800

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++Apache-2.0986100