There are 1 repository under audio-captioning topic.
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
Reading list for research topics in Sound AI
Audio Captioning datasets for PyTorch.
Using pretrained encoder and language models to generate captions from multimedia inputs.
Python code for handling the Clotho dataset.
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)
Song Describer is a data collection platform for annotating music with textual descriptions.
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
Code base for WaveTransformer: A novel architecture for automated audio captioning
Audio captioning baseline system for DCASE 2020 challenge.
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"
Tracking states of the arts and recent results (bibliography) on sound tasks.
Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"
2nd place solution for 2020 DCASE challenge task 6 audio captioning. http://dcase.community/challenge2020/task-automatic-audio-captioning-results#wuyusong2020_t6
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.
Tools for the evaluation of audio captioning.
[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Workshop on Detection and Classification of Acoustic Scenes and Events
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems
DCASE2024 Challenge Task 6 baseline system (Automated Audio Captioning)
PyTorch dataloader for Clotho dataset.
6-th task solution of DCASE2020
Code for using with the Clotho dataset
This reporsitory code form Weakly Supervised Automaed Audio Captioning via Text Only Training
IRIT-UPS DCASE 2021 AUDIO CAPTIONING SYSTEM