morpheus65535 / bazarr

Bazarr is a companion application to Sonarr and Radarr. It manages and downloads subtitles based on your requirements. You define your preferences by TV show or movie and Bazarr takes care of everything for you.

Home Page:https://www.bazarr.media

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Subtitle score is not correlating with matching results

ronaldheft opened this issue · comments

Describe the bug

I’ve been having an ongoing issue where the subtitle selected by Bazarr is not the ideal subtitle. Often it grabs a subtitle that matches a different release, even when the exact matching release is available.

This often occurs when multiple subtitles show a 100% score. It appears like Bazarr isn’t attempting to match all of the metadata fields. For example, sometimes the scenename is available to match on the release_group, and that doesn’t appear to be used.

I’m documented this all below with screenshots.

To Reproduce

  1. Manually search for a subtitle, where the scenename is available.
  2. Notice multiple results with a 100% score.
  3. Notice the release_group is not being used to match, as the release_group is showing a false match.
  4. When pulling the subtitles automatically, the incorrect result is chosen, as multiple results have a 100% score rating.

Expected behavior

All available metadata is used to calculate the score.

Screenshots
2AB73897-9F97-4590-AFA4-FD989995C2D1
6666A403-AE25-4BBA-88DA-88AB0F75C487
F38B142E-FBCF-4BB2-B767-BD21E80F0F07
62D65854-E140-4EB3-9BE5-7F92B3F65068

Software (please complete the following information):
Bazarr Version: 0.8.4.1
Sonarr Version: 2.0.0.5338
Radarr Version: 0.2.0.1450
Operating System: Linux-4.4.59+-x86_64-with (Docker)

If hash match, we don't look for other criteria (except hearing impaired). That's the expected behavior and it's the same with Sub-Zero (we share base code).

So is that bad data on the subtitle provider? Some of these subtitles definitely do not match the release and are out of sync. Selecting the version with the correct release group returns subtitles in sync.

Unfortunately some subtitles uploader are adding hash even if it doesn't match. We have no control over this.

That’s understandable. Could logic be added if multiple results return a matching hash, that addition metadata fields are used instead of selecting the first subtitle result?

Essentially calculate the score again on the subset of results matching the hash, but ignoring the hash and calculating off metadata only?

@pannal something that could be done?

@morpheus65535 Didn't look at the actual code, but looks like the other matches impact sorting, I'll play with bsplayer (which also has hash matching) and I'll let you know.
EDIT: I can reproduce it with bsplayer.

I've stealed taken inspiration from subdivx for matching, and modified subliminal_patch.score with:

--- a/libs/subliminal_patch/score.py
+++ b/libs/subliminal_patch/score.py
@@ -81,7 +81,7 @@ def compute_score(matches, subtitle, video, hearing_impaired=None):
                     matches -= {"hash"}
     elif 'hash' in matches:
         logger.debug('%r: Hash not verifiable for this provider. Keeping it', subtitle)
-        matches &= {'hash'}
+        matches |= {'hash'}
 
     # handle equivalent matches
     if is_episode:

Now I have the right preference, but with crazy scores:
image

Yup, confusing.
Well, there are many alternatives:

  • Take the current approach, and this is a known bug.
  • Use my brutal aproach and deal with >100% scoring when hashes are matched (as a known new bug)
  • Same as previous but 'Disguise' the >100% as 100% in the UI
  • Return hash matching as an attribute and not part of the scoring, then ordering by (hash, score) descending.

EDIT: @morpheus65535 I'll leave the decision up to you, let me know if it's not the first one, so I can give it a try coding it.

What about making hash optional? Something like use scenename?

Wait, there is already code in place to counter this, because OpenSubtitles had the same issue YEARS ago: https://github.com/pannal/Sub-Zero.bundle/blob/master/Contents/Libraries/Shared/subliminal_patch/score.py#L60

If the provider has the necessary metadata to support hash checking ("series", "season", "episode", "format" for TV, "video_codec", "format" for movies), just enable the hash_verifiable flag for that provider and the subtitle class, and this gets fixed automatically.

@pannal that might be the case for bsplayer, but the OP is about OpenSubtitles, and it looks like {"series", "season", "episode", "format"} matches but won't pick the desired subtitle.

That's something to look into, then.
The scoring might not be ideal for such cases. Maybe we should ultimately revise it, but that's not an easy feat.

Edit: Well, when two subtitles have the same score, Bazarr could prioritize the one that matches the most metadata, which would be quite simple.

@pannal but it's dropped when

matches &= {'hash'}

EDIT: ignore this comment, I think I got what you mean.

I've added a secondary scoring method to latest bazarr development, that changes the sorting of subtitles based on (score_with_hash, score_without_hash). This might fix the issue.

Just pulled down the latest development release, and my results are way better! I'm now seeing the correct subtitle selected if there is an exact match.

Screen Shot 2020-02-16 at 2 34 55 PM

I like the approach of doing a secondary sort and keeping the UI at 100% score. If you're considering a hash match a 100% match, then yeah, it makes sense to keep the score at 100% and then from there just pick the best of the bunch.

Thanks for the quick resolution!

also we can add subsync if have low score will help
or subsync for check if subs are good