yearnyeen ho's starred repositories

visualization-curriculum

A data visualization curriculum of interactive notebooks.

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:1269Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4133Issues:0Issues:0

EnCLAP

Official Implementation of EnCLAP (ICASSP 2024)

Language:PythonLicense:MITStargazers:83Issues:0Issues:0

StreamMultiDiffusion

Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."

Language:Jupyter NotebookLicense:MITStargazers:479Issues:0Issues:0

awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

License:MITStargazers:223Issues:0Issues:0

real-time-lyrics-alignment

Codebase for 'A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance', ICASSP 2024

Language:PythonLicense:NOASSERTIONStargazers:10Issues:0Issues:0

timbre-trap

Code for the paper "Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription"

Language:PythonLicense:MITStargazers:27Issues:0Issues:0

pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper

Language:PythonStargazers:63Issues:0Issues:0

torchinfo

View model summaries in PyTorch!

Language:PythonLicense:MITStargazers:2378Issues:0Issues:0

audio-representations

JEPAs for audio representation learning

Language:PythonStargazers:10Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:21015Issues:0Issues:0

VoiceFlow-TTS

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Language:PythonStargazers:254Issues:0Issues:0

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

Stargazers:469Issues:0Issues:0

Call-Response

Responding to the Call: Exploring Automatic Music Composition Using a Knowledge-Enhanced Model

Language:PythonStargazers:5Issues:0Issues:0

music-text-representation-pp

Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR++) [ICASSP24]

Language:PythonStargazers:15Issues:0Issues:0

Cacophony

Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986

Language:PythonLicense:MITStargazers:21Issues:0Issues:0

open-interpreter

A natural language interface for computers

Language:PythonLicense:AGPL-3.0Stargazers:50120Issues:0Issues:0

ML-from-scratch-seminar

This repository is part of a "Machine Learning from Scratch" seminar at Harvard Medical School.

Language:Jupyter NotebookLicense:MITStargazers:239Issues:0Issues:0

umap

Uniform Manifold Approximation and Projection

Language:PythonLicense:BSD-3-ClauseStargazers:7155Issues:0Issues:0

ICASSP-2024-BEAFX-using-DDSP

Github repository for the paper accepted in ICASSP 2024 : Blind estimation of audio effects using an auto-encoder approach and differentiable signal processing

Language:Jupyter NotebookStargazers:10Issues:0Issues:0

Rank-N-Contrast

[NeurIPS 2023, Spotlight] Rank-N-Contrast: Learning Continuous Representations for Regression

Language:PythonStargazers:65Issues:0Issues:0

mini_edm

Minimum implementation of EDM (Elucidating the Design Space of Diffusion-Based Generative Models) on cifar10 and mnist

Language:PythonStargazers:26Issues:0Issues:0

ect

Consistency Models Made Easy

Language:PythonStargazers:151Issues:0Issues:0

DiffusionRet

[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Language:PythonLicense:Apache-2.0Stargazers:105Issues:0Issues:0

MWAFM

Multi-Scale Attention for Audio Question Answering

Language:PythonStargazers:23Issues:0Issues:0

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonLicense:Apache-2.0Stargazers:7203Issues:0Issues:0

edm2

Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)

Language:PythonLicense:NOASSERTIONStargazers:397Issues:0Issues:0

Hybrid-Net

Real-time audio source separation, generate lyrics, chords, beat.

Language:PythonStargazers:635Issues:0Issues:0

ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Language:GoLicense:MITStargazers:74985Issues:0Issues:0

gflownet

Generative Flow Networks - GFlowNet

Language:PythonLicense:Apache-2.0Stargazers:132Issues:0Issues:0