MTG / gaia

C++ library to apply similarity measures and classifications on the results of audio analysis, including Python bindings. Together with Essentia it can be used to compute high-level descriptions of music.

Home Page:http://essentia.upf.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gai SVM weird Accuracy scores

loretoparisi opened this issue · comments

I'm using the High Level Audio feature extraction plus Gaia classifiers, so using the built-in SVM classifiers. I'm getting weird results for my audio dataset, for the different classifications like voice_instrumental, etc. I'm not sure if this is due to the audio input format.

This is how my audio stream looks like from ffmpeg

{
              "index": 0,
              "codec_name": "mp3",
              "codec_long_name": "MP3 (MPEG audio layer 3)",
              "codec_type": "audio",
              "codec_time_base": "1/44100",
              "codec_tag_string": "[0][0][0][0]",
              "codec_tag": "0x0000",
              "sample_fmt": "fltp",
              "sample_rate": "44100",
              "channels": 2,
              "channel_layout": "stereo",
              "bits_per_sample": 0,
              "r_frame_rate": "0/0",
              "avg_frame_rate": "0/0",
              "time_base": "1/14112000",
              "start_pts": 353600,
              "start_time": "0.025057",
              "duration_ts": 3707412480,
              "duration": "262.713469",
              "bit_rate": "128000"
              "tags": {
                "encoder": "Lavc58.18"
              }
            }

Can you give more details? What exactly is weird in the results? You can see the expected accuracy of the models here: http://acousticbrainz.org/datasets/accuracy

@dbogdanov hello! So an example is that for non english songs, in most of cases, I often get a instrumental value at very high accuracy (>0.9), while the track is not instrumental. That is way my wonder was if I'm wrong with the input file format.

Can't say for sure, but this may be the case of the robustness issues, or that the dataset we trained isn't covering non-english vocal music well enough.