worldveil / dejavu

Audio fingerprinting and recognition in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invalid match and poor confidence

balnagy opened this issue · comments

Hi,

I would like to use this library to find the position only in one song, but it's not working at all for the song I'm using. I don't know if it's the problem of the song or the algorithm, but even the tests are failing with invalid match and zero confidence.

The song is very popular (Taylor Swift - Shake It Off), but it's licensed. You can try to get it with a few commands.

youtube-dl https://www.youtube.com/watch?v=nfWlot6h_JM
ffmpeg -i Taylor\ Swift\ -\ Shake\ It\ Off-nfWlot6h_JM.mp4 Taylor\ Swift\ -\ Shake\ It\ Off-nfWlot6h_JM.wav

Then what I did was that I just modified test_dejavu.sh to scan wav files and then execute. I used wav files, because mp3 had a strange length, but wav seems to be ok.

Can you help me to fix this issue?

Thanks!

Please shared the modifications to the script so I or others can run? Just so the issue is completely reproducible.

Sure, I will try and add more instructions.

  1. I modified this line: https://github.com/worldveil/dejavu/blob/master/test_dejavu.sh#L11 to
python dejavu.py fingerprint ./mp3/ wav
  1. I removed all the other mp3 files from the mp3 directory, so only my wav file stayed there.
  2. I created a clean database, named dejavu_test2 in the local MySQL.
  3. I created a dejavu.cnf config file like this:
{
    "database": {
        "host": "127.0.0.1",
        "user": "root",
        "passwd": "", 
        "db": "dejavu_test2"
    }
}
  1. Then I run ./test_dejavu.sh

Results

  • Confidence_*sec.png has only 0 values
  • matching_perc_*sec.png has only invalid values

Thanks!

@balnagy ah I see. There is no problem with Dejavu, but the testing framework has a bug where if the name of your track on disk includes an underscore (_), then the match will always be invalid because the strings compared will be different.

If you look in the results/dejavu-tests.log generated by the testing suite (as you should!), you'll see it is predicting the correct song, but the track name excludes the part following the last underscore:

file: Taylor Swift - Shake It Off-nfWlot6h_JM_69_3sec.wav
song: Taylor Swift - Shake It Off-nfWlot6h
song_result: Taylor Swift - Shake It Off-nfWlot6h_JM
invalid match

But if I extract a random 3 second segment from the Talyor Swift track to mytest.wav and use the command line tool, everything works fine:

$ python dejavu.py recognize file mytest.wav 
{'match_time': 0.3390800952911377, 'song_id': 1, 'confidence': 1326, 'song_name': 'Taylor Swift - Shake It Off-nfWlot6h_JM', 'offset': 646L}

The problem is here. I didn't originally write the testing suite, but had a contributor kind enough to make it. It needs some love, though. Long term, this obviously needs to be fixed.

In the short term, I might use instead 3-4 underscores (____) as a separator, which is even more hackish (ugh), but is a temporary fix. In even shorter term, you could change the filename part nfWlot6h_JM to nfWlot6hJM by removing the underscore.

@worldveil, wow, thanks. Now I repeated the test after renaming the file to 1.wav, so I have higher confidence (40-700), but the offset still doesn't match. I could imagine the song is repeatative, but none of the 5 samples matches, which I think very unlikely.

DEBUG:root:--------------------------------------------------
DEBUG:root:file: 1_170_5sec.wav
DEBUG:root:song: 1
DEBUG:root:song_result: 1
DEBUG:root:correct match
DEBUG:root:query duration: 0.599
DEBUG:root:confidence: 146
DEBUG:root:song start_time: 170
DEBUG:root:result start time: 94.0
DEBUG:root:inaccurate match
DEBUG:root:--------------------------------------------------

Song: https://www.youtube.com/watch?v=nfWlot6h_JM&t=170s
Result: https://www.youtube.com/watch?v=nfWlot6h_JM&t=94s

@balnagy, apologies, I haven't had much time to look into these issues lately. Any progress or thoughts?

It's very hard to debug such a problem, so I gave up and implemented my own algorithm just to find the offset, since I know the song. And it's kind of weird, since if the offset is not precise, then it makes the whole result questionable.

Where would you start debugging?

Not necessarily. The way music is produced now, many of the sounds are direct copies or looped clips, meaning that it might actually be legitimately ambiguous as to which loop the fingerprints matched to.

I would start with the test case script and ensure the algorithm is actually messing up and not just the test suite. The test suite was contributed by someone, that, while I applaud the effort, leaves some room for improvement.

If that isn't it, you might just need to tweak the parameters of the hashing to ensure better offset matching.