monetjoe / Piano-Classification

This study converts piano recordings to mel spectrogram and classifies them by SOTA pre-trained neural network backbones in CV. Comparative experiments show that SqueezeNet achieves a best classification accuracy of 92.37%.|该项目将钢琴录音转为为mel频谱图,使用微调后的前沿计算机视觉领域预训练深度学习骨干网络对其进行分类,对比实验可知SqueezeNet作为最优网络正确率可达92.37%

Home Page:https://arxiv.org/abs/2310.04722

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Piano-Classification

Python application license

Classify piano sound quality by fine-tuned pre-trained CNN models.

Requirements

conda create -n cnn --yes --file conda.txt
conda activate cnn
pip install -r requirements.txt

Usage

Maintenance

git clone git@github.com:monetjoe/Piano-Classification.git
cd Piano-Classification

Train

Assign a backbone(take squeezenet1_1 as an example) after --model to start training:

python train.py --model squeezenet1_1 --fullfinetune True --fl True

--fullfinetune True means full finetune, False means linear probing
--fl True means using focal loss

Supported backbones

Mirror 1
Mirror 2

Plot results

After finishing the training, use the below command to plot the latest results:

python plot.py

Results

A demo result of SqueezeNet fine-tuning:

Results Plots
Loss curve image
Training and validation accuracy image
Confusion matrix image

Cite

@inproceedings{DBLP:journals/corr/abs-2310-04722,
  author    = {Monan Zhou and
               Shangda Wu and
               Shaohua Ji and
               Zijin Li and
               Wei Li},
  title     = {A Holistic Evaluation of Piano Sound Quality},
  booktitle = {Proceedings of the 10th Conference on Sound and Music Technology (CSMT)},
  year      = {2023},
  publisher = {Springer Singapore},
  address   = {Singapore},
  timestamp = {Fri, 20 Oct 2023 12:04:38 +0200},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

About

This study converts piano recordings to mel spectrogram and classifies them by SOTA pre-trained neural network backbones in CV. Comparative experiments show that SqueezeNet achieves a best classification accuracy of 92.37%.|该项目将钢琴录音转为为mel频谱图,使用微调后的前沿计算机视觉领域预训练深度学习骨干网络对其进行分类,对比实验可知SqueezeNet作为最优网络正确率可达92.37%

https://arxiv.org/abs/2310.04722

License:MIT License


Languages

Language:Python 100.0%