microsoft / Pengi

An Audio Language model for Audio Tasks

Home Page:https://arxiv.org/abs/2305.11834

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the files number of TUT 2017 dataset

Zth9730 opened this issue · comments

The Table 2 in the paper says that TUT 2017 contains 6.3k files and includes training, testing, and validation sets. But when I download TUT 2017, there are only 312*15=4680 files and there is only development set. May I ask why there are 6.3k number of files here and how it is divided into training testing validation set?
image

Hi @Zth9730, TUT 2017 has a total of 6.3k files. It's divided into development and evaluation and can be obtained from the below links:

Table 2 in the paper shows full dataset statistics. For Pengi's zero-shot evaluation, we use only the evaluation set to ensure our numbers are comparable with other zero-shot and supervised benchmarks. I hope this helps!

Thanks a lot !!!