wayne391

Wen-Yi Hsiao's starred repositories

fucking-algorithm

刷算法全靠套路，认准 labuladong 就够了！English version supported! Crack LeetCode, not only how, but also why.

Language:Markdown124743 2305 829

yt-dlp

A feature-rich command-line audio/video downloader

Language:PythonUnlicense80829 497 7602

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Language:PythonAGPL-3.027631 154 8431

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause13085 115 983

TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Language:C++Apache-2.010476 156 3647

wavesurfer.js

Audio waveform player

Language:TypeScriptBSD-3-Clause8567 167 2094

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.07994 87 1739

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.06076 37 292

latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Language:PythonMIT4260 63 93

Segment-and-Track-Anything

An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) for key-frame segmentation and Associating Objects with Transformers (AOT) for efficient tracking and propagation purposes.

Language:Jupyter NotebookAGPL-3.02744 51 153

audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

1874 169 4

XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Language:PythonMIT1699 22 127

waveform-playlist

Multitrack Web Audio editor and player with canvas waveform preview. Set cues, fades and shift multiple tracks in time. Record audio tracks or provide audio annotations. Export your mix to AudioBuffer or WAV! Add effects from Tone.js. Project inspired by Audacity.

Language:JavaScriptMIT1448 65 132

CLAP

Contrastive Language-Audio Pretraining

Language:PythonCC0-1.01305 28 86

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT1084 26 72

MidiTok

MIDI / symbolic music tokenizers for Deep Learning models 🎶

Language:PythonMIT646 8 85

Era3D

Language:PythonAGPL-3.0487 17 36

all-in-one

All-In-One Music Structure Analyzer

Language:PythonMIT391 9 12

BeatNet

BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).

Language:PythonCC-BY-4.0311 9 27

llark

Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Language:Jupyter NotebookNOASSERTION287 7 7

lp-music-caps

LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]

Language:Python265 8 9

WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Language:Python191 5 26

ffmpeg-scripts

ffmpeg shell scripts

Language:ShellBSD-3-Clause188 11 6

360monodepth

Code release for 360monodepth. With our framework we achieve monocular depth estimation for high resolution 360° images based on aligning and blending perspective depth maps.

Language:PythonMIT147 5 27

AQUA-Tk

AQUA-Tk = Audio QUality Assessment-Toolkit. (In development)

Language:PythonGPL-3.093 3 3

demucs.cpp

C++17 port of Demucs v3 (hybrid) and v4 (hybrid transformer) models with ggml and Eigen3

Language:C++MIT82 4 12

hFT-Transformer

Pytorch implementation of automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture (hFT-Transformer).

Language:PythonMIT70 3 2

nansypp

Unofficial implementation of NANSY++ in Pytorch Lightning

Language:PythonMIT45 8 3

coco-mulla-repo

Official source codes of coco-mulla

Language:Python25 4 1

music-modeling-time-duration

Code of the paper "Impact of time and note duration tokenizations on deep learning symbolic music modeling" (ISMIR 2023)

Language:Python9 10