Subtitle score is not correlating with matching results

Question

Subtitle score is not correlating with matching results

ronaldheft opened this issue 5 years ago · comments

Describe the bug

I’ve been having an ongoing issue where the subtitle selected by Bazarr is not the ideal subtitle. Often it grabs a subtitle that matches a different release, even when the exact matching release is available.

This often occurs when multiple subtitles show a 100% score. It appears like Bazarr isn’t attempting to match all of the metadata fields. For example, sometimes the scenename is available to match on the release_group, and that doesn’t appear to be used.

I’m documented this all below with screenshots.

To Reproduce

Manually search for a subtitle, where the scenename is available.
Notice multiple results with a 100% score.
Notice the release_group is not being used to match, as the release_group is showing a false match.
When pulling the subtitles automatically, the incorrect result is chosen, as multiple results have a 100% score rating.

Expected behavior

All available metadata is used to calculate the score.

Screenshots

Software (please complete the following information):
Bazarr Version: 0.8.4.1
Sonarr Version: 2.0.0.5338
Radarr Version: 0.2.0.1450
Operating System: Linux-4.4.59+-x86_64-with (Docker)

morpheus65535 · Answer 1 · Sat Feb 15 2020 07:43:22 GMT+0800 (China Standard Time)

If hash match, we don't look for other criteria (except hearing impaired). That's the expected behavior and it's the same with Sub-Zero (we share base code).

Ron Heft · Answer 2 · Sat Feb 15 2020 07:44:45 GMT+0800 (China Standard Time)

So is that bad data on the subtitle provider? Some of these subtitles definitely do not match the release and are out of sync. Selecting the version with the correct release group returns subtitles in sync.

morpheus65535 · Answer 3 · Sat Feb 15 2020 07:47:56 GMT+0800 (China Standard Time)

Unfortunately some subtitles uploader are adding hash even if it doesn't match. We have no control over this.

Ron Heft · Answer 4 · Sat Feb 15 2020 07:53:10 GMT+0800 (China Standard Time)

That’s understandable. Could logic be added if multiple results return a matching hash, that addition metadata fields are used instead of selecting the first subtitle result?

Ron Heft · Answer 5 · Sat Feb 15 2020 07:54:26 GMT+0800 (China Standard Time)

Essentially calculate the score again on the subset of results matching the hash, but ignoring the hash and calculating off metadata only?

morpheus65535 · Answer 6 · Sat Feb 15 2020 09:29:27 GMT+0800 (China Standard Time)

@pannal something that could be done?

German Gutierrez · Answer 7 · Sat Feb 15 2020 16:18:17 GMT+0800 (China Standard Time)

@morpheus65535 Didn't look at the actual code, but looks like the other matches impact sorting, I'll play with bsplayer (which also has hash matching) and I'll let you know.
EDIT: I can reproduce it with bsplayer.

German Gutierrez · Answer 8 · Sat Feb 15 2020 17:16:28 GMT+0800 (China Standard Time)

I've ~~stealed~~ taken inspiration from subdivx for matching, and modified subliminal_patch.score with:

--- a/libs/subliminal_patch/score.py
+++ b/libs/subliminal_patch/score.py
@@ -81,7 +81,7 @@ def compute_score(matches, subtitle, video, hearing_impaired=None):
                     matches -= {"hash"}
     elif 'hash' in matches:
         logger.debug('%r: Hash not verifiable for this provider. Keeping it', subtitle)
-        matches &= {'hash'}
+        matches |= {'hash'}
 
     # handle equivalent matches
     if is_episode:

Now I have the right preference, but with crazy scores:

German Gutierrez · Answer 9 · Sat Feb 15 2020 21:39:59 GMT+0800 (China Standard Time)

Yup, confusing.
Well, there are many alternatives:

Take the current approach, and this is a known bug.
Use my brutal aproach and deal with >100% scoring when hashes are matched (as a known new bug)
Same as previous but 'Disguise' the >100% as 100% in the UI
Return hash matching as an attribute and not part of the scoring, then ordering by (hash, score) descending.

EDIT: @morpheus65535 I'll leave the decision up to you, let me know if it's not the first one, so I can give it a try coding it.

morpheus65535 · Answer 10 · Sat Feb 15 2020 22:26:30 GMT+0800 (China Standard Time)

What about making hash optional? Something like use scenename?

pannal · Answer 11 · Sat Feb 15 2020 22:40:09 GMT+0800 (China Standard Time)

Wait, there is already code in place to counter this, because OpenSubtitles had the same issue YEARS ago: https://github.com/pannal/Sub-Zero.bundle/blob/master/Contents/Libraries/Shared/subliminal_patch/score.py#L60

If the provider has the necessary metadata to support hash checking ("series", "season", "episode", "format" for TV, "video_codec", "format" for movies), just enable the hash_verifiable flag for that provider and the subtitle class, and this gets fixed automatically.

German Gutierrez · Answer 12 · Sun Feb 16 2020 00:13:11 GMT+0800 (China Standard Time)

@pannal that might be the case for bsplayer, but the OP is about OpenSubtitles, and it looks like {"series", "season", "episode", "format"} matches but won't pick the desired subtitle.

pannal · Answer 13 · Sun Feb 16 2020 00:22:53 GMT+0800 (China Standard Time)

That's something to look into, then.
The scoring might not be ideal for such cases. Maybe we should ultimately revise it, but that's not an easy feat.

Edit: Well, when two subtitles have the same score, Bazarr could prioritize the one that matches the most metadata, which would be quite simple.

German Gutierrez · Answer 14 · Sun Feb 16 2020 00:25:12 GMT+0800 (China Standard Time)

@pannal but it's dropped when

matches &= {'hash'}

EDIT: ignore this comment, I think I got what you mean.

pannal · Answer 15 · Sun Feb 16 2020 12:55:58 GMT+0800 (China Standard Time)

I've added a secondary scoring method to latest bazarr development, that changes the sorting of subtitles based on (score_with_hash, score_without_hash). This might fix the issue.

Ron Heft · Answer 16 · Mon Feb 17 2020 03:52:43 GMT+0800 (China Standard Time)

Just pulled down the latest development release, and my results are way better! I'm now seeing the correct subtitle selected if there is an exact match.

I like the approach of doing a secondary sort and keeping the UI at 100% score. If you're considering a hash match a 100% match, then yeah, it makes sense to keep the score at 100% and then from there just pick the best of the bunch.

Thanks for the quick resolution!

rigas40 · Answer 17 · Sun Feb 23 2020 08:57:26 GMT+0800 (China Standard Time)

also we can add subsync if have low score will help
or subsync for check if subs are good