Gai SVM weird Accuracy scores

Question

Gai SVM weird Accuracy scores

loretoparisi opened this issue 6 years ago · comments

I'm using the High Level Audio feature extraction plus Gaia classifiers, so using the built-in SVM classifiers. I'm getting weird results for my audio dataset, for the different classifications like voice_instrumental, etc. I'm not sure if this is due to the audio input format.

This is how my audio stream looks like from ffmpeg

{
              "index": 0,
              "codec_name": "mp3",
              "codec_long_name": "MP3 (MPEG audio layer 3)",
              "codec_type": "audio",
              "codec_time_base": "1/44100",
              "codec_tag_string": "[0][0][0][0]",
              "codec_tag": "0x0000",
              "sample_fmt": "fltp",
              "sample_rate": "44100",
              "channels": 2,
              "channel_layout": "stereo",
              "bits_per_sample": 0,
              "r_frame_rate": "0/0",
              "avg_frame_rate": "0/0",
              "time_base": "1/14112000",
              "start_pts": 353600,
              "start_time": "0.025057",
              "duration_ts": 3707412480,
              "duration": "262.713469",
              "bit_rate": "128000"
              "tags": {
                "encoder": "Lavc58.18"
              }
            }

Dmitry Bogdanov · Answer 1 · Fri Aug 03 2018 19:54:00 GMT+0800 (China Standard Time)

Can you give more details? What exactly is weird in the results? You can see the expected accuracy of the models here: http://acousticbrainz.org/datasets/accuracy

Loreto Parisi · Answer 2 · Fri Aug 03 2018 20:28:51 GMT+0800 (China Standard Time)

@dbogdanov hello! So an example is that for non english songs, in most of cases, I often get a instrumental value at very high accuracy (>0.9), while the track is not instrumental. That is way my wonder was if I'm wrong with the input file format.

Dmitry Bogdanov · Answer 3 · Mon Apr 01 2019 22:17:24 GMT+0800 (China Standard Time)

Can't say for sure, but this may be the case of the robustness issues, or that the dataset we trained isn't covering non-english vocal music well enough.