Ajatt-Tools / impd

Long story short. I have a video file with a bunch of internal subs:

impd probe output:

Index  Language  Title                                        Type
0      unknown   Www.SeiresHD.Com                             video
1      spa       unknown                                      audio
2      eng       unknown                                      audio
3      spa       Spanish - (Caption/Normal Size Char)         subtitle
4      eng       English - (Closed Caption/Normal Size Char)  subtitle
5      unknown   unknown                                      subtitle

When I add a video to my collection, it condenses video with 5 subtitle, which is that sub track for songs and other sounds. I think that impd should choose internal subs based on:

Target language
Size of subtitles

So the largest and target subs should be chosen for condensing. What do you think?

Edit your config file and add the following lines:

langs=spa
prefer_internal_subs=yes

My config:

langs=spanish,spa,esp,lat,cas
prefer_internal_subs=yes
video_dir=/dev/null
bitrate=32k
recent_threshold=10
padding=0.5
line_skip_pattern="^♪〜$|^〜♪$"
filename_skip_pattern="NCOP|NCED"
extract_audio_add_args=()

Try it yourself with any file from this folder: https://mega.nz/folder/oW8ihKCZ#sHuu63kset-BAn-XqFa7Nw

Condensing doesn't work tho. But it's because of bmp fonts I guess. But that's not critical.

I guess you need to manually set what tracks you want because the tracks are incorrectly named.

language: spa

Where they are incorrectly named?

For example, here it chooses Forzados subtitle while should choose 4:

Index  Language  Title     Type
0      unknown   unknown   video
1      spa       unknown   audio
2      eng       unknown   audio
3      spa       Forzados  subtitle
4      spa       unknown   subtitle
5      eng       unknown   subtitle

Can you add smth to detect the largest target-language sub track?

impd chooses the first track that is:

not a song, caption, commentary, etc.
matches the preferred language

impd/impd

Line 111 in 48535fb

guess_track_priority() {

Can you add smth to detect the largest target-language sub track?

Based on the number of symbols used? If so, that is a good idea but I'm not sure if it's easy to do.

@tatsumoto-ren
You can use smth like:

function subs() {
    mkdir -p /tmp/impd_subs
    movie="${1}"
    filename="${1%.*}"
    mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
    OLDIFS=$IFS
    IFS=,
    ( while read idx lang
    do
        echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
        ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" /tmp/impd_subs/"${filename}_${lang}_${idx}.srt"
    done <<< "${mappings}" )
    IFS=$OLDIFS
    wc --total=never -l /tmp/impd_subs/*.srt | grep "_spa_" | sort -r | awk -F_ '{print $NF}' | awk -F. '{print $1}' | head -n1
}

This outputs the number of the largest track.

(Main snippet found here: https://gist.github.com/kowalcj0/ae0bdc43018e2718fb75290079b8839a)

Or much simpler:

while IFS=',' read -r idx lang; do printf "$idx " && ffmpeg -nostdin -hide_banner -loglevel quiet -i "la_directora_S01E02.mkv" -map 0:"$idx" -f srt - | wc -l; done < <(ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "la_directora_S01E02.mkv" | grep ",spa") | sort -nrk2,2 | head -n1 | awk '{print $1}'

This outputs the number of the largest track.

How fast does it work for a typical episode?

For 379mb mkv file output of time for this snippet at my old laptop is 0.32s user 0.34s system 108% cpu 0.607 total

Alright, if it's not too slow (need to test on anime specifically), you can submit the PR. But you also need to think about the following:

only apply this method to subtitle tracks; audio tracks can be autoselected using the current method only.
filter out (or give lower priority to) commentary tracks since they contain garbage but yet can be longer than normal subtitle tracks
filter out all other garbage tracks (songs, signs, comments) though it's likely that they will be shorter than the normal subtitle tracks.

impd should automatically choose right internal subs